Dimension Tables in Data Warehousing

Dimension tables are a fundamental component of dimensional modeling in data warehousing. They describe the business context for the data found in fact tables, providing the "who, what, where, when, why, and how" for business events.

Purpose of Dimension Tables

Dimension tables store descriptive attributes that are used to filter, group, and label measures in fact tables. They provide the human-readable context for the numerical data, enabling users to perform slice-and-dice analysis and understand trends.

Characteristics of Dimension Tables

  • Descriptive Attributes: Contain textual or categorical data.
  • Primary Key: Typically a surrogate key (an auto-generated integer) to ensure stability and performance.
  • Foreign Key in Fact Table: Linked to the fact table via the primary key of the dimension table.
  • Denormalized Structure: Often denormalized to simplify queries and improve performance by reducing the need for joins.
  • Slowly Changing Dimensions (SCDs): May need to handle changes in attribute values over time.

Common Types of Dimensions

Dimensions can be categorized based on their nature and how they are modeled:

Time Dimension:
Represents periods such as days, weeks, months, quarters, and years. It's crucial for time-series analysis.
Geography Dimension:
Describes physical locations, such as countries, regions, states, cities, and stores.
Product Dimension:
Details about products, including categories, brands, SKUs, and descriptions.
Customer Dimension:
Information about customers, such as demographics, segments, and loyalty status.
Employee Dimension:
Attributes related to employees, like departments, roles, and managers.

Example Dimension Table: Product

Consider a DimProduct table used to store information about products:

ProductKey (PK) ProductID (Natural Key) ProductName Category Brand UnitPrice EffectiveDate ExpirationDate
101 PROD-A123 SmartWidget Pro Electronics Innovatech 199.99 2023-01-01 9999-12-31
102 PROD-B456 EcoBottle Apparel GreenLife 25.50 2023-01-01 9999-12-31
103 PROD-A123 SmartWidget Pro V2 Electronics Innovatech 219.99 2023-07-15 9999-12-31

In this example, ProductKey is the surrogate primary key. ProductID is the natural key from the source system. The attributes ProductName, Category, and Brand provide descriptive context. The EffectiveDate and ExpirationDate columns are used to manage Slowly Changing Dimensions (SCD Type 2).

Slowly Changing Dimensions (SCDs)

Dimension attributes can change over time. Managing these changes is crucial for accurate historical analysis. Common SCD types include:

  • Type 0: Retain Original: No change is tracked.
  • Type 1: Overwrite: The old value is replaced with the new value. History is lost.
  • Type 2: Add New Row: A new row is inserted for the changed attribute, with effective dates. Preserves history.
  • Type 3: Add New Column: A new column is added to track a limited history (e.g., "Previous Brand").

The example above for DimProduct illustrates SCD Type 2.

SQL Example: Joining Fact and Dimension Tables

To analyze sales by product category, you would join the fact table (e.g., FactSales) with the dimension table (e.g., DimProduct):

SELECT
    dp.Category,
    SUM(fs.SalesAmount) AS TotalSales
FROM
    FactSales fs
JOIN
    DimProduct dp ON fs.ProductKey = dp.ProductKey
GROUP BY
    dp.Category
ORDER BY
    TotalSales DESC;
                        

Best Practices

  • Use surrogate keys for dimension tables.
  • Keep dimension tables denormalized.
  • Implement appropriate SCD handling strategies.
  • Ensure data quality and consistency.
  • Include attributes that support business analysis needs.