Wednesday 22 July 2015

Dimension & Types

Dimension
A dimension table typically has two types of columns, primary keys to fact tables and textual\descriptive data.
Eg: Time, Customer

Types of Dimensions
  • Slowly Changing Dimensions
  • Rapidly Changing Dimensions
  • Junk Dimensions
  • Inferred Dimensions
  • Conformed Dimensions
  • Degenerate Dimensions
  • Role Playing Dimensions
  • Shrunken Dimensions
  • Static Dimensions 
Slowly Changing Dimensions 
Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a slowly changing attribute and a dimension containing such an attribute is called a slowly changing dimension.


Rapidly Changing Dimensions
A dimension attribute that changes frequently is a rapidly changing attribute. If you don’t need to track the changes, the rapidly changing attribute is no problem, but if you do need to track the changes, using a standard slowly changing dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a rapidly changing dimension.

Junk Dimensions
A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by rapidly changing dimensions.
  • typically low-cardinality flags and indicators
  • Such attributes might consist of data from an optional comment field when a customer places an order and as a result will probably be blank in many cases. Therefore the junk dimension should contain a single row representing the blanks as a surrogate key that will be used in the fact table for every row returned with a blank comment field
  • Eg: Assume that we have a gender dimension and marital status dimension. In the fact table we need to maintain two keys referring to these dimensions. Instead of that create a junk dimension which has all the combinations of gender and marital status (cross join gender and marital status table and create a junk table). Now we can maintain only one key in the fact table.
Inferred Dimensions
While loading fact records, a dimension record may not yet be ready. One solution is to generate a surrogate key with null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension.

Conformed Dimensions
Fixed and reuasble dim
A dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.
Eg: DAte Dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts.

Degenerate Dimensions
A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system.
Eg:Invoice No, EMP Id, Transactional code in FACT table

Role Playing Dimensions
A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table.
For example, a fact table may include foreign keys for both ship date and delivery date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of role playing dimension.

Shrunken Dimensions
A shrunken dimension is a subset of another dimension. For example, the orders fact table may include a foreign key for product, but the target fact table may include a foreign key only for productcategory, which is in the product table, but much less granular. Creating a smaller dimension table, with productcategory as its primary key, is one way of dealing with this situation of heterogeneous grain. If the product dimension is snowflaked, there is probably already a separate table for productcategory, which can serve as the shrunken dimension.

Static Dimensions
Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually — for example with status codes — or it can be generated by a procedure, such as a date or time dimension.
What are Slowly Changing Dimensions?

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions could be: customer, geography, employee.

There are many approaches how to deal with SCD. The most popular are:




  • Type 0 - The passive method
  • Type 1 - Overwriting the old value
  • Type 2 - Creating a new additional record
  • Type 3 - Adding a new column
  • Type 4 - Using historical table
  • Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

  • Type 0 - The passive method. In this method no special action is performed upon dimensional changes. Some dimension data can remain the same as it was first time inserted, others may be overwritten.

    Type 1 - Overwriting the old value. In this method no history of dimension changes is kept in the database. The old dimension value is simply overwritten be the new one. This type is easy to maintain and is often use for data which changes are caused by processing corrections(e.g. removal special characters, correcting spelling errors).

    Before the change:
    Customer_IDCustomer_NameCustomer_Type
    1Cust_1Corporate


    After the change:
    Customer_IDCustomer_NameCustomer_Type
    1Cust_1Retail


    Type 2 - Creating a new additional record. In this methodology all history of dimension changes is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. Both the prior and new rows contain as attributes the natural key(or other durable identifier). Also 'effective date' and 'current indicator' columns are used in this method. There could be only one record with current indicator set to 'Y'. For 'effective date' columns, i.e. start_date and end_date, the end_date for current record usually is set to value 9999-12-31. Introducing changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute could be added in the future.

    Before the change:
    Customer_IDCustomer_NameCustomer_TypeStart_DateEnd_DateCurrent_Flag
    1Cust_1Corporate22-07-201031-12-9999Y


    After the change:
    Customer_IDCustomer_NameCustomer_TypeStart_DateEnd_DateCurrent_Flag
    1Cust_1Corporate22-07-201017-05-2012N
    2Cust_1Retail18-05-201231-12-9999Y


    Type 3 - Adding a new column. In this type usually only the current and previous value of dimension is kept in the database. The new value is loaded into 'current/new' column and the old one into 'old/previous' column. Generally speaking the history is limited to the number of column created for storing historical data. This is the least commonly needed techinque.

    Before the change:
    Customer_IDCustomer_NameCurrent_TypePrevious_Type
    1Cust_1CorporateCorporate


    After the change:
    Customer_IDCustomer_NameCurrent_TypePrevious_Type
    1Cust_1RetailCorporate


    Type 4 - Using historical table. In this method a separate historical table is used to track all dimension's attribute historical changes for each of the dimension. The 'main' dimension table keeps only the current data e.g. customer and customer_history tables.

    Current table:
    Customer_IDCustomer_NameCustomer_Type
    1Cust_1Corporate


    Historical table:
    Customer_IDCustomer_NameCustomer_TypeStart_DateEnd_Date
    1Cust_1Retail01-01-201021-07-2010
    1Cust_1Oher22-07-201017-05-2012
    1Cust_1Corporate18-05-201231-12-9999


    Type 6 - Combine approaches of types 1,2,3 (1+2+3=6). In this type we have in dimension table such additional columns as:


  • current_type - for keeping current value of the attribute. All history records for given item of attribute have the same current value.
  • historical_type - for keeping historical value of the attribute. All history records for given item of attribute could have different values.
  • start_date - for keeping start date of 'effective date' of attribute's history.
  • end_date - for keeping end date of 'effective date' of attribute's history.
  • current_flag - for keeping information about the most recent record.

  • In this method to capture attribute change we add a new record as in type 2. The current_type information is overwritten with the new one as in type 1. We store the history in a historical_column as in type 3.

    Customer_IDCustomer_NameCurrent_TypeHistorical_TypeStart_DateEnd_DateCurrent_Flag
    1Cust_1CorporateRetail01-01-201021-07-2010N
    2Cust_1CorporateOther22-07-201017-05-2012N
    3Cust_1CorporateCorporate18-05-201231-12-9999Y

    No comments:

    Post a Comment