Learning Objectives

By the end of this chapter, you will be able to:

  • Define a data warehouse and explain its purpose in supporting Business Intelligence (BI).
  • Describe the four key characteristics of a data warehouse (Subject-Oriented, Integrated, Time-Variant, Non-Volatile).
  • Differentiate between a data warehouse and a standard transactional database (OLTP vs. OLAP).
  • Define a data mart and explain its relationship to a data warehouse.

Data Warehouses: The Foundation for Analytics

While a standard database is designed to handle day-to-day transactions (Online Transaction Processing or OLTP), a data warehouse is a special type of database designed specifically for data analysis and reporting (Online Analytical Processing or OLAP). It serves as a central repository of historical and integrated data from various sources across an organization.

Data warehouses support business intelligence Figure 1: Data Warehouse Architecture

flowchart LR
    subgraph SOURCES["Data Sources"]
        ERP["🏭 ERP"]
        CRM["👥 CRM"]
        WEB["🌐 Web"]
        EXT["📥 External"]
    end

    SOURCES --> ETL["⚙️ ETL\nExtract\nTransform\nLoad"]
    ETL --> DW[("🏛️ Data Warehouse\nIntegrated\nHistorical\nNon-Volatile")]

    DW --> DM1["📊 Sales Mart"]
    DW --> DM2["💰 Finance Mart"]
    DW --> DM3["📣 Marketing Mart"]

    DM1 --> BI["📊 BI & Analytics"]
    DM2 --> BI
    DM3 --> BI

    style DW fill:#1565c0,color:#fff

Figure 2: Data Warehouse and Data Marts Architecture

Key Characteristics of a Data Warehouse

Data warehouses are defined by four key characteristics:

  1. Subject-Oriented: Data is organized around the major subjects of the business, such as Customer, Product, Sales, and Supplier. This is different from an OLTP database, which is organized around specific business processes like order entry.

  2. Integrated: A data warehouse integrates data from multiple, often heterogeneous, sources (e.g., relational databases, flat files, external sources). During the integration process, the data is cleaned and transformed to ensure consistency in naming conventions, formats, and units of measure.

  3. Time-Variant: The data in a warehouse represents a historical record. It contains data over a long period (e.g., 5-10 years), allowing for analysis of trends, patterns, and changes over time. Each data point is associated with a specific point in time.

  4. Non-Volatile: Data in a warehouse is not meant to be changed or updated in real-time like in an OLTP database. Once data is loaded into the warehouse, it is typically read-only. New data is added periodically in batches, but previous data is not erased. This ensures a stable and reliable historical record for analysis.

The Purpose of a Data Warehouse

The primary purpose of a data warehouse is to support Business Intelligence (BI) activities. It provides a “single source of truth” that business analysts, data scientists, and managers can use to:

  • Run complex queries and create reports.
  • Conduct trend analysis.
  • Use data mining techniques to discover hidden patterns.
  • Power dashboards and data visualizations.

By separating the analytical workload (OLAP) from the transactional workload (OLTP), a data warehouse ensures that heavy analysis does not slow down the day-to-day operations of the business.

Data Marts: A Focused Subset

A data mart is a smaller, simpler version of a data warehouse that is focused on a single subject or line of business. It is a subset of the data warehouse, designed to meet the specific needs of a particular department or user group.

  • Example: The Sales department might have its own data mart containing data specifically related to sales transactions, customers, and salesperson performance. The Finance department might have a separate data mart for financial data.

Why use a Data Mart?

  • Simplicity: It is easier and faster for end-users to work with a smaller, focused dataset that is relevant to their needs.
  • Faster Access: Queries on a data mart are often faster than on a full data warehouse.
  • Easier to Build: A data mart can be built more quickly and at a lower cost than a comprehensive enterprise data warehouse.

An organization might have an enterprise-wide data warehouse that feeds data into several departmental data marts.

Summary

Data warehouses are specialized databases designed for analysis, not transactions. They provide an integrated, historical, and stable repository of subject-oriented data, serving as the single source of truth for all business intelligence activities. Data marts are smaller, departmental subsets of a data warehouse that provide focused, faster access for specific user groups. Together, they form the core infrastructure for modern data analytics.

Key Takeaways

  • Data warehouses are for analysis (OLAP), while operational databases are for transactions (OLTP).
  • The four characteristics of a data warehouse are: Subject-Oriented, Integrated, Time-Variant, and Non-Volatile.
  • Data warehouses provide a “single source of truth” for business intelligence.
  • A data mart is a smaller, departmental subset of a data warehouse.

Discussion Questions

  1. Why is it a bad idea to run complex analytical queries on a company’s live transactional (OLTP) database?
  2. Explain the “Integrated” characteristic of a data warehouse. Why is this so important?
  3. When would a company choose to build a data mart instead of a full enterprise data warehouse?