Dimension Tables in Data Warehousing
Dimension tables are a fundamental component of dimensional modeling in data warehousing. They describe the business context for the data found in fact tables, providing the "who, what, where, when, why, and how" for business events.
Purpose of Dimension Tables
Dimension tables store descriptive attributes that are used to filter, group, and label measures in fact tables. They provide the human-readable context for the numerical data, enabling users to perform slice-and-dice analysis and understand trends.
Characteristics of Dimension Tables
- Descriptive Attributes: Contain textual or categorical data.
- Primary Key: Typically a surrogate key (an auto-generated integer) to ensure stability and performance.
- Foreign Key in Fact Table: Linked to the fact table via the primary key of the dimension table.
- Denormalized Structure: Often denormalized to simplify queries and improve performance by reducing the need for joins.
- Slowly Changing Dimensions (SCDs): May need to handle changes in attribute values over time.
Common Types of Dimensions
Dimensions can be categorized based on their nature and how they are modeled:
- Time Dimension:
- Represents periods such as days, weeks, months, quarters, and years. It's crucial for time-series analysis.
- Geography Dimension:
- Describes physical locations, such as countries, regions, states, cities, and stores.
- Product Dimension:
- Details about products, including categories, brands, SKUs, and descriptions.
- Customer Dimension:
- Information about customers, such as demographics, segments, and loyalty status.
- Employee Dimension:
- Attributes related to employees, like departments, roles, and managers.
Example Dimension Table: Product
Consider a DimProduct
table used to store information about products:
ProductKey (PK) | ProductID (Natural Key) | ProductName | Category | Brand | UnitPrice | EffectiveDate | ExpirationDate |
---|---|---|---|---|---|---|---|
101 | PROD-A123 | SmartWidget Pro | Electronics | Innovatech | 199.99 | 2023-01-01 | 9999-12-31 |
102 | PROD-B456 | EcoBottle | Apparel | GreenLife | 25.50 | 2023-01-01 | 9999-12-31 |
103 | PROD-A123 | SmartWidget Pro V2 | Electronics | Innovatech | 219.99 | 2023-07-15 | 9999-12-31 |
In this example, ProductKey
is the surrogate primary key. ProductID
is the natural key from the source system. The attributes ProductName
, Category
, and Brand
provide descriptive context. The EffectiveDate
and ExpirationDate
columns are used to manage Slowly Changing Dimensions (SCD Type 2).
Slowly Changing Dimensions (SCDs)
Dimension attributes can change over time. Managing these changes is crucial for accurate historical analysis. Common SCD types include:
- Type 0: Retain Original: No change is tracked.
- Type 1: Overwrite: The old value is replaced with the new value. History is lost.
- Type 2: Add New Row: A new row is inserted for the changed attribute, with effective dates. Preserves history.
- Type 3: Add New Column: A new column is added to track a limited history (e.g., "Previous Brand").
The example above for DimProduct
illustrates SCD Type 2.
SQL Example: Joining Fact and Dimension Tables
To analyze sales by product category, you would join the fact table (e.g., FactSales
) with the dimension table (e.g., DimProduct
):
SELECT
dp.Category,
SUM(fs.SalesAmount) AS TotalSales
FROM
FactSales fs
JOIN
DimProduct dp ON fs.ProductKey = dp.ProductKey
GROUP BY
dp.Category
ORDER BY
TotalSales DESC;
Best Practices
- Use surrogate keys for dimension tables.
- Keep dimension tables denormalized.
- Implement appropriate SCD handling strategies.
- Ensure data quality and consistency.
- Include attributes that support business analysis needs.