Snowflake is a database. Like all databases, Snowflake stores your data and gives you the ability to retrieve that data. Until here nothing to write home about.
What does stand out about Snowflake is a set of features it offers, which make it attractive to a large range of users, from freelancers to small as well as large enterprises. In this blog, I would like to introduce you to three remarkable Snowflake features:
- flexible and precise scalability;
- semi-structured data can be easily loaded, analysed and joined;
- secure data sharing.
The first remarkable Snowflake feature is its scalability. Organizations with peaks of usage during the day will benefit greatly from the ability to increase and decrease their capacity at need, and to pay it per second of usage. Effectively, scalability relieves organisations from the dilemma of compromising either on cost or performance.
But how does this work? The first thing to know is that Snowflake separates computing from storage. This allows Snowflake to scale both the computing layer (where queries are run) and the storage layer independently and seamlessly.
Let’s now have a look at the two ways Snowflake data warehouses can be scaled.
Scaling up: increasing the warehouse size
Some workloads or queries require more computational power than others. With Snowflake, it is possible to increase or decrease the size of a data warehouse to adapt it to different computing needs, and pay per second of usage. In Snowflake jargon, we refer to this as scaling up/scaling down.
Scaling up or down a data warehouse requires no downtime: the size of the warehouse can be modified while a query is running, and the changes will be effective as soon as the query has completed. Thanks to this feature, warehouses can be scaled up for specific periods and then scaled down when the jobs is done, without needing to compromise neither on costs nor performance.
Scaling out aka increasing the amount of clusters
Similarly, Snowflake offers multi-cluster data warehouses, which can be scaled in or out. A warehouse that has multiple clusters at its disposal is able to compute concurrent queries and reduce or eliminate queueing time, according to the users’ preference.
In order to get a data warehouse to autoscale, users can set a desired minimum and a maximum of clusters. This way, the data warehouse will start off at the minimum and automatically scale out until the maximum amount of available clusters, and then back down, reflecting the amount of concurrent sessions or queries.
The second feature I would like to bring to your attention is the way Snowflake handles semi-structured data. Snowflake makes it remarkably easy to load and analyse semi-structured data, as well as to join it with other datasources.
Whether your semi-structure data format is JSON, Avro, ORC, Parquet, or XML, you can load it onto Snowflake as Variant data type. The Variant data type supports the nested structure of semi-structure data. This allows users to upload the data as it is, without having to transform it into a tabular format.
Secure data sharing
Lastly, Snowflake users have the ability to share objects in a secure way via secure data sharing. When sharing data within Snowflake, no data is actually copied. The data remains stored in the provider’s account, and it is simply made available to the consumer. As a result, sharing the data creates no additional costs for the provider and no storage costs for the consumer. Each account can create an unlimited amount of shares.
Users are able to share Tables and External tables, Secure views and Secure materialized views, as well as Secure UDFs. All shared objects are read-only for the consumer, that means that they cannot be changed or deleted, but only queried. This assures that only the data provider has control over the data that is being shared, while still allowing the consumer to run their own queries on it.
Further, secure data shares can be made accessible to anybody. Even if our target consumer does not currently have a Snowflake account, it is possible for the data provider to create a Reader account for them. Reader accounts are be hosted within the provider’s account. In this case, the data provider will pay for the Reader’s usage. The provider can choose whether to monetize their shares or simply use this feature as a convenient and safe way to share data with third parties.