MSDN Documentation

Microsoft Developer Network

Data Warehousing Core Concepts

Welcome to the foundational concepts of data warehousing. This section provides an in-depth look at the essential elements that define and enable effective data warehousing solutions. Understanding these core principles is crucial for designing, implementing, and managing robust business intelligence systems.

What is a Data Warehouse?

A data warehouse is a central repository of integrated data from one or more disparate sources. Its primary purpose is to store historical and current data in a way that supports analysis and decision-making. Unlike transactional databases (OLTP), data warehouses are optimized for read-heavy analytical queries (OLAP - Online Analytical Processing).

Key characteristics of a data warehouse:

Key Components of a Data Warehouse System

Data Sources

These are the operational systems that generate the data. They can include:

Data Staging Area

This is an intermediate storage area where data is extracted from sources, cleaned, transformed, and prepared before being loaded into the data warehouse. It plays a vital role in data quality management.

Extraction, Transformation, and Loading (ETL)

ETL is the backbone of data warehousing. It involves:

For example, transforming a 'date' field from different source formats into a single, standardized YYYY-MM-DD format.

-- Example of a transformation rule
IF SourceDateFormat = 'MM/DD/YYYY' THEN
    TargetDate = CONVERT(DATE, SourceDate, 101)
ELSE IF SourceDateFormat = 'DD-MON-YY' THEN
    TargetDate = CONVERT(DATE, SourceDate, 106)
ELSE
    TargetDate = DefaultDate
END IF;

Data Warehouse Database

This is the core repository where the integrated and transformed data resides. It is typically a relational database optimized for analytical queries. Technologies like SQL Server, Snowflake, Redshift, and BigQuery are commonly used.

Metadata

Metadata is "data about data." It describes the data in the warehouse, including its source, format, transformations applied, and business definitions. It's crucial for understanding and using the data warehouse effectively.

Business Intelligence (BI) Tools

These are applications that users interact with to analyze data, create reports, dashboards, and perform ad-hoc queries. Examples include Power BI, Tableau, and QlikView.

Dimensional Modeling vs. Normalized Modeling

While transactional systems often use highly normalized schemas to reduce redundancy and ensure data integrity for writes, data warehouses typically employ dimensional modeling for optimized reads:

Data Marts

A data mart is a subset of a data warehouse that is focused on a specific business line or team (e.g., sales data mart, marketing data mart). They provide a more targeted view of data for specific user groups, improving performance and usability for those users.

Mastering these core concepts will lay a strong foundation for your journey into the world of data warehousing and business intelligence.