Decision Framework to choose between Redshift, BigQuery, and Snowflake (3)

As a quick intro, choosing your data warehouse provider can be overwhelming. You can either choose blindly by trusting your data engineering friend who "knows his sh*t" or spend weeks benchmarking the features of those services. I am here to provide a third alternative: a series of articles to compare performance on specific topics.



In this series, you'll find a fair feature comparison between Redshift, BigQuery and Snowflake. I will try as much as possible to provide unbiased comparisons. You won't find any recommendations to choose a service more than another, nor performance comparisons (which are marginal anyway)


You'll find all the articles of the series and the topics tackled below:


Let’s get started!


Data access


Data warehouses' first advantage is how easily one access the data easily. Originally, databases have relied on ODBC / JDBC. We’re starting to see a lot of investment in providing neat user interfaces as well as additional APIs. You'll find below the various ways to access data by data warehouse provider.


💙 BigQuery

  • ODBC / JDBC access via Simba drivers

  • BigQuery UI

  • BigQuery CLI (command-line tool)

  • BigQuery Jobs API

  • BigQuery connections API

  • BigQuery Storage API


❤️ Redshift

  • ODBC / JDBC via AWS provided drivers

  • Redshift UI in the AWS console for some node types: ra3.*, ds2.8xlarge, dc2.large, dc2.8xlarge, dc1.8xlarge only

  • Asynchronously via Data Access API

  • Access via the AWS CLI

  • psql CLI


💚 Snowflake

  • ODBC / JDBC access via drivers

  • Snowsight (some features are in-preview)

  • Access via Spark plugin

  • Access via Kafka

  • Python / Node.js / Go / .NET drivers for specific languages

  • SnowSQL (command-line tool)


Encryption


💙 BigQuery


BigQuery automatically encrypts all data before it is written to disk. The data is automatically decrypted when read by an authorized user. By default, Google manages the key-encryption keys used to protect your data. You can also use customer-managed encryption keys, and encrypt individual values within a table.


❤️ Redshift


You can enable encryption when you launch your cluster, or you can modify an unencrypted cluster to use AWS Key Management Service (AWS KMS) encryption. To do so, you can use either an AWS-managed key or a customer-managed key (CMK).


💚 Snowflake


Protecting customer data is one of Snowflake’s highest priorities. Snowflake encrypts all customer data by default, using the latest security standards, at no additional cost. End-to-end encryption (E2EE) is a form of communication in which no one but end users can read the data (in transit, at rest). Certain features (for instance, periodic rekeying, customer managed keys) are available for premium plans only.


Support for third-party tools (visualization, data modeling)


Support for third-party tools is pretty much similar across the three data warehouses.


Support won't be always the same regarding the tools and data warehouse provider but you'll be able to connect all the most common visualization tools and data modeling software.


Visualization


Amazon has QuickSight where Google has Looker/DataStudio. Snowflake is revamping its UI to provide some basic visualizations comparable to what Metabase offers.


Data Modeling and Scheduling


Google Cloud recently acquired Dataform as ELT tool. Its offering is comparable to what dbt, Matillion or Airbyte offer. All tools connect really well to all data warehouse infrastructure. For the scheduling part, Airflow, the Airbnb open-source python scheduler will work perfectly with all infrastructure. Google Cloud has its own managed Airflow system.


Data Discovery


None of them have a great data discovery/catalog tool well integrated to their infrastructure. Snowflake recently revamp its UI to improve discovery but didn't hack it yet. That's why we built Castor.


Hope you enjoyed the reading of those comparison/benchmarking point regarding:

  • Data Access

  • Encryption

  • Support for third-party tools


If you want to find more benchmarking points, please find them in the following links :


Castor is building the documentation layer on top of your data warehouse. We help any data people within your company to get all the context around data assets.


castor_logo_1.png

Your data has never been so clear and friendly

  • Medium
  • LinkedIn
  • Twitter

© 2020 Castor. All registered.

LegalCareers