Decision Framework to choose between Redshift, BigQuery, and Snowflake (1)

4 min read

By Xavier de Boisredon

As a quick intro, choosing your data warehouse provider can be overwhelming. You can either choose blindly by trusting your data engineering friend who "knows his sh*t" or spend weeks benchmarking the features of those services. I am here to provide a third alternative: a series of articles to compare performance on specific topics.

In this series, you'll find a fair feature comparison between Redshift, BigQuery and Snowflake. I will try as much as possible to provide unbiased comparisons. You won't find any recommendations to choose a service more than another, nor performance comparisons (which are marginal anyway)

You'll find all the articles of the series and the topics tackled below:

Let’s get started!

Snowflake supports a high-level data architecture — Snowflake Architecture

Query Processing (or Compute layer)

The compute layer of your workload manages requests from external systems, controlling access, and ensuring requests are appropriately authorized. It contains the runtime environment that your business logic will be deployed and executed by.

💙 BigQuery

Runs on distributed compute. Runs on Dremel, a large multi-tenant cluster that executes Standard SQL queries. BQ is orchestrated via Borg, Google’s precursor to Kubernetes. Compute and storage talk to each other through the petabit Jupiter network.

❤️ Redshift

Proprietary fork of ParAccel (which was partly forked from Postgres) running on AWS virtual machines. Its proximity to Postgres shouldn't fool you, they don't have much in common.

💚 Snowflake

Proprietary compute engine with intelligent predicate pushdown + smart caching running on commodity virtual machines (AWS, GCP or Azure) depending on your cloud choice. Hybrid columnar system inspired by C-Store, MonetDB among others.

Storage layer

All 3 databases have implementations of hot / warm / cold storage. This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the three, two of the three, or all three types.

💙 BigQuery

Proprietary, stored on the Colossus filesystem using ColumnIO as a storage format. Complete separation of distributed compute and storage.

The concept of “platforms” is a key principle of technical infrastructure at Google, and Colossus is the unified global platform for much of storage, from Spanner to Bigtable, from internal object stores to Cloud Storage, and even Compute Engine Persistent Disks. Colossus helps address encryption, low-level security primitives, global scale, durability, performance, hot-spotting, cross-data center replication, maintenance, upgrades, and so on.

❤️ Redshift

Proprietary but is generally SSD (dc1, dc2) / HDD (ds1, ds2) or mixed including S3 based (for RA3) using a proprietary columnar format. RA3 separates compute and storage, whilst all other node types colocalize your compute and storage together.

💚 Snowflake

The database storage layer resides in a scalable cloud storage service, such as Amazon S3, which ensures data replication, scaling, and availability without any management by customers. Snowflake optimizes and stores data in a columnar format within the storage layer, organized into databases as specified by the user.

Compression

Data compression is a reduction in the number of bits needed to represent data. Compressing data can save storage capacity, speed up file transfer, and decrease costs for storage hardware and network bandwidth.

💙 BigQuery

Proprietary compression managed by the ColumnIO columnar format. BigQuery compresses data under the hood for you on an ongoing basis. Yet, your queries are still billed as if you are scanning uncompressed bytes.Mosha Pasumansky in April of 2016 described Capacitor, BigQuery’s next generation columnar storage format, which replaced ColumnIO (on which Parquet and ORC are based).

❤️ Redshift

Redshift achieves transparent compression by implementing open algorithms e.g., LZO, ZStandard. They released their own proprietary compression algorithm (AZ64).

You can select the columns you would like to be compressed and how. You can generate recommendations to select the best compression algorithm with the query ANALYZE COMPRESSION.

💚 Snowflake

Snowflake also provides its own compression layer that is opaque to the user. Unlike BigQuery you aren’t billed for bytes scanned but it does mean the query planner can take advantage of the compression and table statistics to scan less data (and reduce compute cost). There’s little you can do to tune this externally – however that’s a pro and not a con.

Deployment

All of these databases follow cloud-only deployments. If you have to run on-premise you’re going to be a little stuck for the moment unless you plan to migrate that data into the cloud.

💙 BigQuery

Cloud only – within Google Cloud Platform. Each dataset can be distributed across a region (US or EU) or you pick a specific region from a list.

❤️ Redshift

Cloud only – within Amazon Web services. You can pick from pretty much every AWS region in a list of 25 regions at a cluster level.

💚 Snowflake

Cloud only – within Amazon Web Services / Google Cloud Platform / Azure depending on your preferred flavor of cloud at an account level.

Hope you enjoyed the reading of those comparison/benchmarking point regarding:

Query Processing
Storage layer
Compression
Deployment

If you want to find more benchmarking points, please find them in the following links :

‍

Castor is building the documentation layer on top of your data warehouse. We help any data people within your company to get all the context around data assets.

‍

Subscribe to the newsletter

New Release

Table of Contents

Why Look for Atlan Alternative?

Resources

Xavier de Boisredon

April 20, 2020

The ultimate data stack

Stay up-to-date with the latest best practices in data visualization, analytics, and ETL with CastorDoc's insightful guide.

Learn more

Arpit Choudhury

November 16, 2021

Cloud Data Warehousing: Past, Present, and Future

Explore the evolution of cloud data warehousing and its impact on modern data management.

Learn more

Louise de Leyritz

August 25, 2023

Data Catalog Tools List & Evolution

CastorDoc evaluates data catalog solutions for mid-market & enterprise companies, assisting you in selecting the right tool for your data management needs.

Learn more

Get in Touch to Learn More

See Why Users Love Coalesce Catalog

Fantastic tool for data discovery and documentation

“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.” - Michal P., Head of Data

Let’s get started!

Query Processing (or Compute layer)

Storage layer

Compression

Deployment

‍

Subscribe to the newsletter

You might also like

Get in Touch to Learn More