Guide to Redshift Remodeling- Friends and Foes

In this series, we make an attempt to chronicle our experience and best practices with redshift having used it in ‘anger’ in many projects. In this part 1 of the series, we look for appropriate schema design for redshift, the various alternatives and the pros and cons of each.

Schema Design

Proper dimensional model is an absolute need for Redshift to perform well. Three different dimensional models work best with Redshift:

The Snowflake
The Star
Flat : To understand the differences between the three modeling schemes let’s get some terminology out of the way. There are two types of tables that we create in a dimensional model Facts and Dimensions. The fact tables are the measures, while the dimension tables contain descriptive attributes. So, the facts usually have numbers and we use some numeric operations like sum, count, average on them. On the contrary, the dimensions are used to filter or categorize the data. The number of unique, in a dimension, would always be an order of magnitude smaller in comparison to the number of facts. The difference between Star and Snowflake Schema manifests in the ‘normalization’, we do in the dimensions. In the Snowflake schema, the dimension tables are d iteratively whereas in Star there is only one level of normalization. Examples So, let’s visualize this with two tables Sales fact and Store dimension table. The Sales fact stores the amount of product sold from each store. Star Schema

Here store and product are dimension tables while sales are the fact table. Data in dimension tables do not change frequently and as mentioned above, contains the descriptive attributes. Here, the store table stores the outlet information while the product is the collection of all types of products produced by the company. Fact table sales, rather stores every transaction or sale of a product made by the individual store, so it is highly dynamic in nature and usually the heaviest one as well. Note all the columns in sales are of integer type, which makes it easy to do aggregations, indexing for queries to run faster. Also, fact tables must be distributed across different redshift clusters by either stores or products to gain huge improvements in query performance. Snowflake Schema

Do you see a problem with Star Schema? There is one, dimension table “store” is not normalized. That means if there are 1000 stores in India, country ‘India’ would be repeated 1000 times. Same is the case with city and state. This makes dimension tables unnecessary bulkier. While denormalizing extensively increases the number of joins to fetch the data, which adds to computing time and adds to query execution. Flat Schema There exists another model i.e. flat model, which is essentially another level of denormalization over the star schema. The state, city, country and store all get folded into the fact table. Here are the pros and cons of the three modeling choices.

The three approaches compared

Type	Normalization	Ease of Query	Storage size	Joins
Snowflake	High	Difficult	Low	Many
Star	Medium	Medium	Medium	One
Flat	Low	Easy	High	None

Recommendations

If you are using Redshift for data marts where query performance is critical the right choice is a snowflake schema. if you are using redshift as generic data warehouse it is preferred to build a star schema. To know more about such cutting-edge technology, interesting use cases and deep insights to dwell upon and transform your business growth, reach out to us. Till then, keep innovating!

Read about how to use Lottie on your website with easiness.

13 May 2024

Guide to Redshift Remodeling- Friends and Foes

Schema Design

Recommendations

Read about how to use Lottie on your website with easiness.

Unleashing the Power of Unstructured Data: BluePi’s Data Modernization Solutions for Personalized Banking

Breakthrough Data Architecture: 6 Reasons to Love Snowflake’s Unique Approach to Data Warehousing

Unlock The Power of Your Data: Data Lake vs. Data Warehouse for Smarter Business Decisions (2024)

Brutal DDoS Attack – Is it the Beginning of a Bleak Future?

How to automate copying an EC2 AMI from one region to another

Guide to Redshift Remodeling- Friends and Foes

Schema Design

Recommendations

Read about how to use Lottie on your website with easiness.

Unleashing the Power of Unstructured Data: BluePi’s Data Modernization Solutions for Personalized Banking

Breakthrough Data Architecture: 6 Reasons to Love Snowflake’s Unique Approach to Data Warehousing

Unlock The Power of Your Data: Data Lake vs. Data Warehouse for Smarter Business Decisions (2024)