The star and snowflake schema are logical storage designs commonly found in data marts and data warehouse architecture. While common database types use ER (Entity-Relationship) diagrams, the logical structure of warehouses uses dimensional models to conceptualize the storage system.
Read on to find out the differences, characteristics, and flaws of the star and snowflake schemas.
The two main elements of the dimensional model of the star and snowflake schema are:
1. Facts table. A table with the most considerable amount of data, also known as a cube.
2. Dimension tables. The derived data structure provides answers to ad hoc queries or dimensions, often called lookup tables.
Connecting chosen dimensions on a facts table forms the schema. Both the star and snowflake schemas make use of the dimensionality of data to model the storage system.
The main differences between the two schemas are:
Star Schema | Snowflake Schema | |
---|---|---|
Elements | Fact table Dimension tables | Fact table Dimension tables Subdimension tables |
Structure | Star-shaped | Snowflake shaped |
Dimensions | One table per dimension | Multiple tables for each dimension |
Model Direction | Top-down | Bottom-up |
Storage space | Uses more storage | Uses less space |
Normalization | Denormalized dimension tables | Normalized dimension tables |
Query Performance | Fast, fewer JOINs needed because of fewer foreign keys | Slow, more JOINs required because of more foreign keys |
Query Complexity | Simple and easier to understand | Complicated and more challenging to understand |
Data Redundancy | High | Low |
Use case | Dimension tables with several rows, typical with data marts | Dimension tables with multiple rows found with data warehouses |
Due to the complexity of the snowflake schema and the lower performances, the star schema is the preferred option whenever possible. One typical way to get around the problems in the snowflake schema is to decompose the dedicated storage into multiple smaller entities with a star schema.
A star schema is a logical structure for the development of data marts and simpler data warehouses. The simple model consists of dimension tables connected to a facts table in the center.
The facts table typically consists of:
The lookup tables represent descriptive information directly connected to the facts table.
For example, to model the sales of an ecommerce business, the facts table for purchases might contain the total price of the purchase. On the other hand, dimensional tables have descriptive information about the items, customer data, the time or location of purchase.
The star schema for the analysis of purchases in the example has four dimensions. The facts table connects to the dimensional tables through the concept of foreign and primary keys. Apart from the numerical data, the facts table therefore also consists of foreign keys to define relations between tables.
The main characteristics of the star schema are:
The disadvantages of using the star schema are:
The snowflake schema has a branched-out logical structure used in large data warehouses. From the center to the edges, entity information goes from general to more specific.
Apart from the dimensional model's common elements, the snowflake schema further decomposes dimensional tables into subdimensions.
The ecommerce sales analysis model from the previous example further branches ("snowflakes") into smaller categories and subcategories of interest.
The four dimensions decompose into subdimensions. The lookup tables further normalize through a series of connected objects.
The main features of the snowflake schema include:
The weaknesses of the snowflake schema are:
The star and snowflake schema are both logical designs of database storage systems. Both schemas show up in big data systems, where the amount of information is massive and hard to maintain.
For more information about big data platforms, check out our article about the Hadoop platform.