As a quick intro, choosing your data warehouse provider can be overwhelming. You can either choose blindly by trusting your data engineering friend who "knows his sh*t" or spend weeks benchmarking the features of those services. I am here to provide a third alternative: a series of articles to compare performance on specific topics.
In this series, you'll find a fair feature comparison between Redshift, BigQuery and Snowflake. I will try as much as possible to provide unbiased comparisons. You won't find any recommendations to choose a service more than another, nor performance comparisons (which are marginal anyway)
You'll find all the articles of the series and the topics tackled below:
Let’s get started!
Data warehouses' first advantage is how easily one access the data easily. Originally, databases have relied on ODBC / JDBC. We’re starting to see a lot of investment in providing neat user interfaces as well as additional APIs. You'll find below the various ways to access data by data warehouse provider.
ODBC / JDBC access via Simba drivers
BigQuery CLI (command-line tool)
BigQuery Jobs API
BigQuery connections API
BigQuery Storage API
ODBC / JDBC via AWS provided drivers
Redshift UI in the AWS console for some node types: ra3.*, ds2.8xlarge, dc2.large, dc2.8xlarge, dc1.8xlarge only
Asynchronously via Data Access API
Access via the AWS CLI
ODBC / JDBC access via drivers
Snowsight (some features are in-preview)
Access via Spark plugin
Access via Kafka
Python / Node.js / Go / .NET drivers for specific languages
SnowSQL (command-line tool)
BigQuery automatically encrypts all data before it is written to disk. The data is automatically decrypted when read by an authorized user. By default, Google manages the key-encryption keys used to protect your data. You can also use customer-managed encryption keys, and encrypt individual values within a table.
You can enable encryption when you launch your cluster, or you can modify an unencrypted cluster to use AWS Key Management Service (AWS KMS) encryption. To do so, you can use either an AWS-managed key or a customer-managed key (CMK).
Protecting customer data is one of Snowflake’s highest priorities. Snowflake encrypts all customer data by default, using the latest security standards, at no additional cost. End-to-end encryption (E2EE) is a form of communication in which no one but end users can read the data (in transit, at rest). Certain features (for instance, periodic rekeying, customer managed keys) are available for premium plans only.
Support for third-party tools (visualization, data modeling)
Support for third-party tools is pretty much similar across the three data warehouses.
Support won't be always the same regarding the tools and data warehouse provider but you'll be able to connect all the most common visualization tools and data modeling software.
Amazon has QuickSight where Google has Looker/DataStudio. Snowflake is revamping its UI to provide some basic visualizations comparable to what Metabase offers.
Data Modeling and Scheduling
Google Cloud recently acquired Dataform as ELT tool. Its offering is comparable to what dbt, Matillion or Airbyte offer. All tools connect really well to all data warehouse infrastructure. For the scheduling part, Airflow, the Airbnb open-source python scheduler will work perfectly with all infrastructure. Google Cloud has its own managed Airflow system.
None of them have a great data discovery/catalog tool well integrated to their infrastructure. Snowflake recently revamp its UI to improve discovery but didn't hack it yet. That's why we built Castor.
Hope you enjoyed the reading of those comparison/benchmarking point regarding:
Support for third-party tools
If you want to find more benchmarking points, please find them in the following links :
Castor is building the documentation layer on top of your data warehouse. We help any data people within your company to get all the context around data assets.