Sunday, 31 May 2015

Business Intelligence Dictionary: Slowly Changing Dimensions-SCD explained

Business Intelligence Dictionary: Slowly Changing Dimensions-SCD explained 

Things change. Dimensions are no exclusion. When a data designer plans warehouse architecture they have to plan ahead with a slowly changing dimensions strategy.
Slowly Changing Dimensions-SCD | Definition
Slowly Changing Dimension (SCD) is a term first introduced by Kimball Group among strategies for implementation of a data warehouse. Ralph Kimball states that dimensions normally change overtime and offers several strategies how a data designer may handle this process.

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions could be: customer, geography, employee.

There are many approaches how to deal with SCD. The most popular are:

Type 0 - The passive method
Type 1 - Overwriting the old value
Type 2 - Creating a new additional record
Type 3 - Adding a new column
Type 4 - Using historical table
Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

SCD Type 0 | Definition
SCD 0 is the simplest form of strategy. SCD 0 says: "Do not update the old row even if a change is detected. Preserve the initial state."
SCD Type 1 | Definition & Example
SCD 1 imposes a destructive update where old values are overwritten by the new state of a dimension.
SCD Type 2 | Definition & Example
SCD 2 is the most commonly used SCD type. It is flexible and optimized for most business scenarios. SCD 2 imposes "expiration" of a record that is handled by adding 2 new columns to the architecture - start date and end date, i.e. a period within which a dimension is active. End dates for active dimensions are hard-coded to future date like 9999-12-31.
SCD Type 3 | Definition & Example
SCD 3 is rarely used due to data redundancy. SCD 3 requires that a column is appended every time a dimension changes and this new column saves the previous state. SCD 3 also has a slightly optimized version that requires only 2 columns - Old State and New State but the obvious shortage is that this way it can only save 2 conditions.
SCD Type 4 | Definition & Example
SCD 4 states that one table must include all dimensions in their latest state and yet another History Table would be holding the previous states.
SCD Type 6 | Definition & Example
SCD 6 is a hybrid of SCD 1, SCD 2 and SCD 3 where it gets its name from (1+2+3). It imposes (SCD 1) a new row for each change, (SCD 2) Start Date/End Date columns and (SCD 3) a new flag column Y/N.
Note: You may also want to add a separate column, known as "Delete Indicator". It is used whenever a dimension does not have an active state anymore but you don't want to delete it in order to preserve the historical records for it.

No comments:

Post a Comment