Understanding Facts and Dimensions in Data Warehousing
Data warehousing is built upon a dimensional model, which organizes data into facts and dimensions. This structure makes it easier to query and analyze business processes. Understanding the distinction between facts and dimensions is fundamental to designing and implementing an effective data warehouse.
Facts: The Measurements of a Business Process
Facts are the numerical, additive measures that represent the key performance indicators (KPIs) or metrics of a business. They are typically associated with a specific business event or transaction.
- Additive: Facts can be summed across any dimension. For example, sales amount can be summed by date, product, or store.
- Semi-Additive: Facts can be aggregated across some dimensions but not others. For instance, account balances are additive across time but not across different account types.
- Non-Additive: Facts cannot be meaningfully aggregated across any dimension. Ratios or percentages often fall into this category.
Common examples of facts include:
- Sales Amount
- Quantity Sold
- Order Count
- Profit
- Inventory Levels
- Click-Through Rate
Facts are stored in fact tables, which are typically large and contain foreign keys linking to dimension tables.
Example Fact Table (Sales):
CREATE TABLE FactSales (
SalesKey INT PRIMARY KEY,
DateKey INT,
ProductKey INT,
StoreKey INT,
CustomerKey INT,
SalesAmount DECIMAL(10, 2),
QuantitySold INT,
FOREIGN KEY (DateKey) REFERENCES DimDate(DateKey),
FOREIGN KEY (ProductKey) REFERENCES DimProduct(ProductKey),
FOREIGN KEY (StoreKey) REFERENCES DimStore(StoreKey),
FOREIGN KEY (CustomerKey) REFERENCES DimCustomer(CustomerKey)
);
Dimensions: The Context for Facts
Dimensions provide the context for the facts. They describe the who, what, where, when, why, and how of the business event. Dimension tables contain descriptive attributes that allow users to slice, dice, and filter the fact data.
Dimension attributes are typically used for grouping, filtering, and labeling in reports and queries. They are usually textual or categorical and are not directly aggregated.
Common examples of dimensions include:
- Time/Date: Year, Quarter, Month, Day, Day of Week
- Product: Product Name, Category, Brand, SKU
- Geography/Location: Store Name, City, State, Country, Region
- Customer: Customer Name, Demographics, Segment
- Employee: Salesperson Name, Department, Manager
- Promotion: Promotion Name, Type, Discount Percentage
Dimension tables are usually smaller than fact tables and contain primary keys that are referenced by foreign keys in the fact table.
Example Dimension Table (Product):
CREATE TABLE DimProduct (
ProductKey INT PRIMARY KEY,
ProductName VARCHAR(255),
ProductCategory VARCHAR(100),
ProductBrand VARCHAR(100),
SKU VARCHAR(50)
);
The Relationship: A Sales Transaction
Consider a single sales transaction. The facts might be the SalesAmount
and QuantitySold
. The dimensions providing context would be:
- When: The date of the sale (from
DimDate
). - What: The product sold (from
DimProduct
). - Where: The store where the sale occurred (from
DimStore
). - Who: The customer who made the purchase (from
DimCustomer
).
A query to find the total sales amount for a specific product category in a particular month would join the FactSales
table with DimProduct
and DimDate
tables, filtering by category and month.
Key Design Principles
- Granularity: The level of detail in a fact table. It determines the lowest level at which facts are recorded.
- Conformed Dimensions: Dimensions that are standardized and shared across multiple fact tables. This is crucial for integrating data from different business processes.
- Slowly Changing Dimensions (SCDs): Techniques for handling changes in dimension attributes over time (e.g., a customer moving to a new address).
By effectively separating facts and dimensions, data warehouses enable powerful analytical capabilities, allowing businesses to gain deep insights into their operations and make informed decisions.
Last Updated: October 26, 2023