Fact Tables in Data Warehousing
Fact tables are the central tables in a dimensional model. They contain the measurements or metrics that businesses want to analyze. These metrics are often additive, meaning they can be summed across various dimensions.
Understanding Fact Tables
A fact table typically resides at the intersection of multiple dimension tables. Each row in a fact table represents a specific business event or transaction. The columns in a fact table can be broadly categorized into:
- Foreign Keys: These columns link the fact table to its corresponding dimension tables. They are essential for slicing and dicing the data along different business perspectives.
- Measures: These are the quantitative values that represent the business metrics. They are usually numeric and can be aggregated (sum, count, average, etc.).
- Degenerate Dimensions (Optional): Sometimes, a business key from a source system that doesn't have other descriptive attributes might be included directly in the fact table.
Types of Fact Tables
Fact tables can be classified based on the granularity of the data they store and how they handle changes over time:
- Transactional Fact Tables: Capture individual business transactions. Each row represents a single event, like a sale, a shipment, or a web click. This is the most common type.
- Periodic Snapshot Fact Tables: Capture the state of a business process at a specific point in time (e.g., end of day, end of month). This is useful for tracking trends over time.
- Accumulating Snapshot Fact Tables: Track the progress of a business process that has a known start and end point, but an unknown duration. For example, tracking the lifecycle of an order from placement to delivery.
Granularity
Granularity defines the lowest level of detail in a fact table. It's crucial to define granularity clearly, as it impacts the data volume, performance, and analytical capabilities. Examples include:
- Transaction-level: Each row represents a single item on an order.
- Daily summary: Each row represents the total sales for a product in a store on a given day.
- Monthly summary: Each row represents the total revenue for a product category across all stores in a month.
Example Fact Table: Sales Facts
Consider a retail sales data warehouse. A SalesFact
table might look like this:
Column Name | Data Type | Description |
---|---|---|
DateKey |
INT | Foreign key to the DimDate table. |
ProductKey |
INT | Foreign key to the DimProduct table. |
StoreKey |
INT | Foreign key to the DimStore table. |
CustomerKey |
INT | Foreign key to the DimCustomer table. |
SalesAmount |
DECIMAL(10,2) | The total amount of the sale. |
QuantitySold |
INT | The number of units sold. |
DiscountAmount |
DECIMAL(10,2) | The amount of discount applied. |
In this example:
DateKey
,ProductKey
,StoreKey
, andCustomerKey
are foreign keys.SalesAmount
,QuantitySold
, andDiscountAmount
are measures.
Factless Fact Tables
Some fact tables, known as "factless fact tables," do not contain any numeric measures. Instead, they are used to record the occurrence of an event or to link dimensions. Common uses include tracking:
- Activities: Recording which customers performed which actions (e.g., logged in, viewed a page).
- Coverage: Verifying if a particular combination of dimensions exists (e.g., which products were available on a given day).
For example, a table tracking student attendance might only have keys for StudentKey
, DateKey
, and CourseKey
, indicating that a student attended a specific course on a specific date.
Fact tables are the heart of a data warehouse, providing the quantitative data necessary for business intelligence and decision-making. Their design, granularity, and the types of measures they contain are critical for the overall effectiveness of the data warehouse.