Snowflake Warehouses | Autoscaling & Concurrency

Snowflake is a cloud-based data platform that enables organisations to store, process and analyse their data using a scalable and flexible architecture. Its architecture consists of three layers: storage, compute and services. The storage layer is responsible for storing data, the compute layer processes queries and analyses, and the services layer handles tasks such as query optimisation, security and metadata management.  This blog will explore an important feature, Snowflake warehouses, which refers to the compute layer of the architecture.

Snowflake’s warehouses are virtual compute clusters that process queries and perform analytical tasks on data stored in the platform’s storage layer. The warehouses can be scaled up or down in size and number of clusters to match the workload demands and the data processing requirements.

Snowflake Warehouses Explained

Warehouses are a crucial part of the Snowflake data platform because they allow organisations to achieve high-performance data processing and analytics. Snowflake offers various warehouse sizes to suit different business needs, ranging from X-Small to 4X-Large and beyond. Organisations can choose the appropriate warehouse size based on their data processing requirements and budget.

Moreover, warehouses enable Snowflake to offer unique features like autoscaling and concurrency. Autoscaling automatically adjusts the number of clusters in a warehouse to match the workload demand and optimise resource utilisation. Concurrency, on the other hand, refers to the ability of Snowflake warehouses to process multiple queries simultaneously, without sacrificing performance or accuracy. We’ll discuss autoscaling and concurrency in more detail later.

In traditional data warehouses, multiple users querying the same data set can lead to contention for resources and slow down query performance. Snowflake’s architecture eliminates this problem by distributing the workload across multiple virtual warehouses, allowing each user to access and analyse the data they need without being affected by the queries of others.

Warehouse Sizes

Warehouses can be configured to be of standard size or multi-cluster. For the standard warehouse, you can choose from eight sizes. Snowflake makes the sizing convenient by classifying warehouse sizes into t-shirt sizes: X-Small, Small, Medium, Large, X-Large, 2X-Large, 3X-Large, 4X-Large.

Choose which warehouse sizes fits the needs of the use case. For the multi-cluster warehouse, compute resources are configured to scale as query needs change. There are two modes in multi-cluster: auto-scale and maximised.

  1. Maximised: When the warehouse is started, Snowflake always starts all the clusters to ensure maximum resources are available while the warehouse is running.
  2. Auto Scaling: Snowflake starts and stops clusters as needed to dynamically manage the workload on the warehouse.

Warehouses are a key component of the Snowflake data platform, allowing users to scale their compute resources on demand to meet changing query workloads. The flexibility and elasticity of virtual warehouses make them particularly well-suited for organisations that need to support dynamic workloads with varying query requirements.

Choosing the right size warehouse is important for optimising performance and cost. If a warehouse is too small, queries may run slower than expected, and if it’s too large, it may result in unnecessary costs. To choose the right size warehouse, customers should consider the size of their dataset, the complexity of their queries, and the number of concurrent users accessing the data.

Warehouse Autoscaling

One of the benefits of Snowflake is the ability to scale warehouses up or down as needed, without having to move data or change application code. This allows customers to adjust their compute resources based on changing workloads or business needs, resulting in cost savings and improved performance.

Autoscaling in Snowflake refers to the automatic adjustment of warehouse size based on demand. When a customer runs queries on Snowflake, the system monitors the workload and automatically adjusts the size of the warehouse to meet the demand. This means that when there is high demand, Snowflake will automatically scale up the warehouse size to provide more computing power. Conversely, when demand decreases, Snowflake will automatically scale down the warehouse size to save costs.

Key Benefits of Autoscaling in Snowflake

  • Cost savings: By automatically scaling up or down warehouse size, customers can ensure that they are only paying for the computing power they need, without having to overprovision or manually adjust the size of the warehouse. This can lead to significant cost savings over traditional data warehousing solutions.
  • Improved performance: By automatically adjusting warehouse size, customers can ensure that queries are always running on the appropriate amount of computing power, leading to faster query processing times and improved overall system performance.

Autoscaling in Snowflake Demo

Consider the situation where multiple users are running complex queries on the same database.

The warehouse is set to autoscale with one minimum and eight maximum clusters. As you can see, when the first user starts a query, one cluster becomes active. When a second user starts a query, two clusters become active. A third user starts a query but two clusters can still handle the workload of three queries hence only two clusters are active. A fourth user starts a query and now, three clusters are active. And once queries have been processed, clusters that are not needed will then be inactive.

As demonstrated, autoscaling in Snowflake allows customers to dynamically adjust their warehouse size based on demand, leading to cost savings, improved performance and greater flexibility.

Warehouse Concurrency

In addition to autoscaling, Snowflake offers another key feature called concurrency, which allows multiple users to access and analyse data simultaneously without any impact on performance. Concurrency is critical for organisations with many users who need to access the same data sets, as it enables teams to work together efficiently and effectively.

Concurrency works by separating compute resources from storage resources, so each user can access the same data without affecting other users. Even if hundreds or thousands of users are querying the same data set, each user’s query runs independently, and does not impact the performance of other queries.

Below is a simple graphic of how concurrency works, in this situation there are eight queries of the same complexity running at the same time on different warehouse sizes.

Assuming the complex query takes up compute resources equal to one whole warehouse (WH) of size XS, the first query takes up one WH of size XS while subsequent queries are queued.

WH of size S can take twice the load of WH of size XS meaning two queries run concurrently while subsequent queries are queued, and so on.

WH of size XL can take twice the load of WH of size L. It can even run 16 queries concurrently.

Key Benefits of Concurrency

The benefits of concurrency are numerous. Its key benefit is that it:

  • Eliminates contention for resources: such as CPU, memory, and storage. It enables teams to collaborate on data analysis projects more effectively, as multiple users can work on the same data sets at the same time without any performance issues.
  • Enables faster time to insights: Users can query data in real-time without having to wait for other queries to complete. This can be especially valuable in fast-paced business environments where decisions need to be made quickly.

Overall, the combination of autoscaling and concurrency in Snowflake’s data platform provides customers with a powerful and flexible solution for handling large volumes of data and complex analytics workloads, all while maintaining high levels of performance and cost-efficiency.

Summary

Snowflake’s warehouse concept is a crucial component of the platform, allowing users to scale their compute resources to meet changing query workloads. The flexibility and elasticity of virtual warehouses make them ideal for organisations that need to support dynamic workloads with varying query requirements. Autoscaling and concurrency are two unique features that make Snowflake an ideal solution for organisations that require cost-effective and high-performing data processing and analytics.