Snowflake Schema is the expansion of star schema in which each star explodes into other points. In a snowflake schema, that dimensional table is standardized into numerous lookup tables, each of which represents a level in the dimensional hierarchy. But in star schema, each dimension is given by single dimension table. In my previous article I have given the information about common table expressions with examples. In this article we will focus on Snowflake schema with industry examples.
For instance, the Time Dimension has two distinct hierarchies:
- Year– Month– Day
- Week – Day
In a snowflake schema, there will be four lookup tables: one for the year, one for the month, one for the week, and one for the day. Month is linked to Year, while Day is linked to Month. Week only has a connection to Day. The image to the right shows a sample snowflake schema that demonstrates the aforementioned relationships in the
Time Dimension.
What is importance of Snowflake schema ?
Snowflake is a great and vast technological firm. It focuses exclusively on data, making it simple to enable governed access to practically endless amounts of data as well as cutting-edge tools, applications, and services. With the help of the Data Cloud, you may work together locally and internationally to uncover novel insights, develop previously unanticipated business prospects, and recognize and know your customers at any time through smooth and pertinent interactions.
In addition to a portion of ANSI SQL:1999 and the SQL:2003 analytic extensions, Snowflake supports regular SQL. When common versions of a number of commands do not clash with one another, Snowflake schema also supports those variations.
Snowflake Architecture :
Snowflakes consist of following 3 layers :
- Query Processing Layer.
- Cloud Services Layer.
- Database Storage Layer
Database Storage Layer :
Storage for TABLE DATA and QUERY RESULT SETS is on a sizable logical hard drive. Previously restricted to AWS S3, it is now accessible through Google Cloud Storage (GCP) and Azure Blob Storage for Azure accounts. It’s a hybrid columnar where the same row of data is in the same micro-partition and data for the same column is clustered.
Query Processing Layer :
Each query is executed on a different virtual warehouse in this tier, which is the computing layer of the architecture. The Snowflake computing engines that drive its scalable computation capability are
virtual warehouses. In order to optimize future query replies, the compute layer locally
caches data from the storage layer, therefore each Virtual Warehouse has its own cache.
Cloud Services Layer :
Like others, the cloud service layer is also independently scalable. The brain of the snowflake is the service layer. It provides security for data, manages information, optimizes data transactions, and shares data. A cloud service is a stateless computing resource that uses highly accessible and practical metadata across various availability zones. SQL client interface is made possible by the service layer for data operations like
DDL and DML.
Characteristics of Snowflake Schema :
- Less storage space is needed using the snowflake schema.
- Table subdivision enables study at different depths of interest.
- It’s easy to add new subdimensions as well.
- The schema has lower redundancies and a higher level of data integrity as a
result of normalization.
TOOLS AND INTERFACE OF SNOWFLAKE
Tools :
Snowflake offers a platform that enhances the performance of business intelligence tools while letting users concentrate on BI initiatives. An effective BI solution is provided by the Snowflake Data Cloud, which offers a variety of BI tools. With traditional data warehouses, it is impossible to mine the best analytics.
Interface:
Users of the Snowflake UI can execute more queries, view previously executed queries, load a table with a constrained amount of data, and conduct DDL/DML operations on it. Through the user interface, you can modify your Snowflake user password as well as other options, including your email address.
DATABASE REPLICATION AND FAILOVER :
The Snowflake Data Replication procedure involves designating a database as a primary database and periodically copying all DDL actions carried out on that database to a secondary database in a separate availability zone, region, or cloud provider.
Failover to the backup database can be implemented in addition to data replication. This means that the secondary database can act as a backup to carry out database activities in the event of a calamity, such as a hardware failure, preventing any downtime.
Advantages of Snowflake Schema :
- It is very easy and simple to maintain because of no redundancy.
- The snowflake schema’s key benefit is the improvement in query performance
because less disc storage is used and fewer smaller lookup tables are joined. - Compared to traditional data warehouses it has better data performance.
- The cloud-oriented design of Snowflakes facilitates cross-cloud applications and
multi-cloud systems. - Direct creation of the reader account is possible through the user interface
Disadvantages of Snowflake Schema :
- Snowflakes works on only a few cloud platforms, like Google Cloud, AWS, and
Azure. - It is very difficult to learn since there are many complex queries.
- More implementation time.
- No Data Constraints.
- The smallest amount of information has been used to build many hierarchies that
can belong to the same dimension.
I hope you like this article. If you like this article or if you have any issues with the same kindly comment in comments section.