What is a Data Warehouse?
In today’s rapidly changing corporate environment, organizations are turning to cloud-based technologies for convenient data collection, reporting, and analysis. This is where Data Warehousing comes in as a core component of business intelligence that enables businesses to enhance their performance. It is important to understand what is a data warehouse and why it is evolving in the global marketplace.
What is a Data Warehouse?
A data warehouse is a type of data management system that is designed to enable and support business intelligence (BI) activities, especially analytics. Data warehouses are solely intended to perform queries and analysis and often contain large amounts of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files and transaction applications.
It centralizes and consolidates large amounts of data from multiple sources. Its analytical capabilities allow organizations to derive valuable business insights from their data to improve decision-making. Over time, it builds a historical record that can be invaluable to data scientists and business analysts. Because of these capabilities, a data warehouse can be considered an organization’s “single source of truth.”
How does it work?
A data warehouse may contain multiple databases. Within each database, data is organized into tables and columns. Within each column, you can define a description of the data, such as integer, data field, or string. Tables can be organized inside of schemas, which you can think of as folders. When data is ingested, it is stored in various tables described by the schema. Query tools use the schema to determine which data tables to access and analyze.
Types of Data Warehouse
There are three main types:
Enterprise Data Warehouse (EDW)
This type of warehouse serves as a key or central database that facilitates decision-support services throughout the enterprise. The advantage of this type of warehouse is that it provides access to cross-organizational information, offers a unified approach to data representation, and allows the running of complex queries.
Operational Data Store (ODS)
This type of data warehouse refreshes in real time. It is often preferred for routine activities like storing employee records. It is required when data warehouse systems do not support reporting needs of the business.
A data mart is a subset of a data warehouse built to maintain a particular department, region, or business unit. Every department of a business has a central repository or data mart to store data. The data from the data mart is stored in the ODS periodically. The ODS then sends the data to the EDW, where it is stored and used.
An enterprise data warehouse allows for decision-making across your organization to happen faster and better than if you directly accessed disparate data stores. The major advantages are:
- Better data quality. More trust. Data from a warehouse has been cleansed, de-duplicated, and standardized. This is true whether you use the traditional ETL pipeline, where data is transformed before loading into your warehouse, or the modern ELT approach where data is transformed in the warehouse as needed by a particular consumer. Having a consistent, “single source of truth” builds trust in the insights and decisions derived from any analysis.
- Complete picture. Better, faster analysis. A warehouse unifies and harmonizes data from a wide range of sources, such as operational databases, transactional systems, and flat files. This gives you a more complete picture of your business and allows you to leverage BI activities such as data mining, augmented analytics, and machine learning to find patterns you could easily miss with data silos. Also, accurate, complete data is available more quickly, so you can turn information into insight faster.
Do you need a data warehouse?
Some businesses and industries require data analysis that is not only massive in scale but also ongoing and in real-time. For example, some service providers use real-time data to dynamically adjust prices throughout the day. Insurance companies track policies, sales, claims, payroll, and more. They also use machine learning to predict fraud. Gaming companies must track and react to user behavior in real-time to enhance the player’s experience. Data warehouses make all of these activities possible.
If your organization has or does any of the following, you’re probably a good candidate for a data warehouse:
- Multiple sources of disparate data
- Big-data analysis and visualization—both asynchronously and in real-time
- Machine learning/AI
- Streaming analytics
- Custom report generation/ad hoc analysis
- Data mining
- Data Science
Why Use a Data Warehouse?
It can provide:
- A single point of access for all data, rather than requiring users to connect to dozens or even hundreds of systems individually
- An assurance of data quality
- A history of the data they store
- Separation between day-to-day operational systems and analytical systems, for security reasons
- A standard set of semantics around data, for example, consistency in naming conventions, codes for different types of products, languages, and currencies
Storing comprehensive data in structured relationships means that data warehouses can also provide answers to a whole variety of complex questions, like:
- How much revenue has each of our product lines brought in per month over the past ten years, broken out by city and state?
- What is the average transaction size at one of our ATMs, broken out by the time of day and total customer assets?
- What is the percentage of employee turnover for the past year in stores that have been open for at least three years? How many hours did those employees work per week?
The Future of Data Warehousing
- Changes in Regulatory constraints may limit the ability to combine sources of disparate data. These disparate sources may include unstructured data which is difficult to store.
- As the size of the databases grows, the estimates of what constitutes a very large database continue to grow. It is complex to build and run data warehouse systems which are always increasing in size. The hardware and software resources available today do not allow to keep a large amount of data online.
- Multimedia data cannot be easily manipulated as text data, whereas textual information can be retrieved by the relational software available today. This could be a research subject.
The data warehouse is a company’s repository of information about its business and how it has performed over time. Created with input from employees in each of its key departments, it is the source for analysis that reveals the company’s past successes and failures and informs its decision-making.