In early June, I had the opportunity to join the 1,800+ attendees of Snowflake’s inaugural conference in San Francisco, which featured over 130 Snowflake-related sessions over the course of four days. As both a Snowflake customer & Certified Snowflake Implementation Partner, I have seen firsthand how Snowflake’s cloud-built data warehouse-as-a-service allows organizations to break through the data barriers that impede them from becoming truly data-driven.
With 80 of the Summit sessions dedicated to showcasing customer success stories, I was able to see how successful Snowflake implementations have allowed customers to think about their data differently. It is no longer something to be “managed” and has become foundational to their business. In fact, it often changes their business. They create new products based on data to generate new sources of revenue and even new business models.
Snowflake has already proven to be a truly transformational technology, but the “relentless innovation” isn’t slowing down. Snowflake on Google Cloud, Cross-Cloud & Cross-Region Data Replication, Snowflake Organizations, External Tables, Data Pipelines, and Data Exchange are just a few of the new product announcements made during the Summit.
Snowflake has innovated around 4 main themes: Core Data Warehouse, Data Pipelines, Global Snowflake and Secure Data Sharing. I’ll list the features and give each a more detailed description.
Snowflake on Google Cloud Platform
Snowflake officially announced their strategic partnership with Google Cloud Platform, with “Snowflake on Google Cloud” launching in preview in Fall 2019, with general availability scheduled for early 2020. Large organizations require flexibility across their cloud strategy and Snowflake enables seamless and secure data integration throughout organizations and across platforms via their cloud-built data warehouse.
With this new offering, Snowflake customers have the ability to choose the cloud vendor(s) – AWS, Azure, Google Cloud – that is right for their business. It also makes it easier for customers to utilize the Google ecosystem of applications and tools in conjunction with Snowflake. This new offering will allow customers to create a fully integrated data platform connecting applications running on GCP, as well as run across all three major public clouds (AWS, Azure, Google Cloud) for business continuity, disaster recovery, and geographical load balancing.
Snowflake Database Replication and Failover
Global Snowflake is a core theme behind Snowflake’s product strategy for becoming the customers’ global cloud data solution across regions and cloud providers. Snowflake Database Replication enables customers to replicate databases and keep them synchronized across multiple accounts in different regions and/or cloud providers.
Failover allows users to automatically recover multiple databases in a secondary region after a failure in the primary region that results in full or partial loss of Snowflake service availability. Changes can be synchronized to a different region or cloud provider, ensuring data durability and availability at all times. Snowflake Database Replication and Failover occurs in real time and recovery time does not depend on data size.
Global Account Management
With the potential for Snowflake replication & failover over multiple geographically disparate regions within each of the major public clouds (AWS, Azure, Google Cloud), a single Snowflake customer could have many Snowflake instances. With Global Account Management, customers can view, create, and manage their own accounts across all Snowflake deployments and see all usage across all Snowflake instances in a single, consolidated place.
Core Data Warehouse
Materialized Views are out of preview and are now generally available, along with new SQL features – CONNECT BY, recursive CTE, and COLLATIONS. Until now, these features were only offered by traditional on-prem data warehouse solutions. This opens up additional migration opportunities from legacy on-premise data warehouse products, as well as continues to narrow the boundaries between operational databases (OLTP) and operational data platforms (OLAP).
Next Generation Worksheets
Snowflake has acquired Numeracy, a unique and compelling SQL query editor that has support for key additional functionality such as SQL autocomplete, query and worksheet sharing, in-worksheet visualizations, and lightning-fast catalog browsing and search. The features of the Numeracy UI are being ported into the Snowflake UI to create a brand new Worksheet experience that more closely resembles the self-service SQL analytics offered by newer Cloud BI tools like Mode.
Data pipelines are important for real-time analytics to help organizations make faster, data-driven decisions. Snowpipe, Auto-Ingest, Streams & Tasks, and Snowflake Connector for Kafka provide continuous, automated and cost-effective services to load all of a customers’ data efficiently and without any manual effort. With Snowflake, customers can query data directly from their data lake on AWS S3 or Azure Blob Storage where it resides, thus allowing them to maintain the data lake as the single source of truth. With Hive Metastore integration specifically, users can automatically keep Snowflake schema in sync with their data lake, eliminating manual tasks of ensuring data sets are kept in sync.
Data Pipelines: Auto-Ingest
AWS and Azure provide notification mechanisms to notify users whenever an object is created. Auto-Ingest uses these mechanisms by layering them over the ingest service so that the ingest service can automatically detect and retrieve files created under a stage and ingest them into their appropriate tables. This is important because it reduces latency for queries by ingesting and transforming data as it arrives.
Continuous data pipelines automate the processes involved in loading data into Snowflake tables and then transforming and optimizing the data for further analysis. Snowflake provides the following features to enable continuous data pipelines:
- Continuous data loading: Options for continuous data loading include Snowpipe, along with support for third-party data integration tools.
- Change data tracking: In a continuous data pipeline, streams record when staging tables and any downstream tables are populated with data from business applications using continuous data loading and are ready for further processing using SQL statements.
- Recurring tasks: A Snowflake task object defines a recurring schedule for executing a SQL statement, including statements that call stored procedures.
Data Pipelines: Streams and Tasks
The Streams and Tasks feature is fundamental to building end-to-end data pipelines and orchestration in Snowflake. While customers can use Snowpipe or their ELT provider of choice, that approach is limited to just loading data into Snowflake. Streams and Tasks aims to provide a task scheduling mechanism so customers no longer have to resort to external jobs for their most common scheduling needs for Snowflake SQL jobs. The feature also enables customers to connect their staging tables and downstream target tables with regularly processed logic that picks up new data from the staging table and transforms it into the shape required for the target table.
Data Pipelines: Apache Kafka Connector
Apache Kafka is a platform for building pipelines to handle continuous streams of records. This connector makes it fast and easy to reliably publish these records to your Snowflake instance for storage and analysis. The Snowflake Kafka Connector reads data from an Apache Kafka topic and loads that data into a Snowflake table. The Snowflake Kafka Connector is one of many ways to load data into Snowflake, and enables a Kafka system to write data into a Snowflake instance. Common use cases for Kafka include replacing message brokers, collecting web usage data, and aggregating metrics.
Data Lake: External Tables, Credential-less External Stages, and Hive Metastore Integration
- External tables reference data files in a cloud storage (i.e. AWS S3, Google Cloud Storage, or Microsoft Azure) data lake. External tables store file-level metadata about the data files such as the file path, a version identifier, and partitioning information. This enables querying data stored in files in a data lake as if it were inside a database.
- Hive Metastore integration enables integrating a Hive metastore with Snowflake using external tables. The Hive connector in Snowflake listens to metastore events and transmits them to Snowflake to keep the external tables synchronized with the Hive metastore. This allows users to manage their tables in Hive while querying them from Snowflake.
- Credential-less external stages provide an option where customers do not have to pass secret keys or access tokens for storage accounts. They can be created on cloud storage accounts from GCP, Azure and AWS clouds. Additionally, admins of customer accounts can restrict the usage of external stages for certain cloud storage locations, thus preventing data exfiltration.
The Snowflake Data Exchange is a free-to-join marketplace that enables Snowflake users to connect with data providers to seamlessly discover, access, and generate insights from each other’s data. Unlike traditional data transfer done through APIs or by extracting data to cloud storage, the Snowflake Data Exchange improves the control and security of exchanging data.
Snowflake customers will be able to easily access the Data Exchange from their Snowflake account and search a data catalog to discover and securely access real-time data that they can join with their existing data sets in Snowflake. Customers of the Data Exchange will not incur any data storage fees, as the data remains securely stored in the provider’s Snowflake account.
Data providers can share live, public, or private data sets in a fully-governed way and promote their data services to 1,500+ Snowflake customers to create new revenue streams. Providers also get insights into the types of data being accessed and used by their consumers.
Link list for further exploration:
Curious to learn more about how Snowflake can help your business become more data-driven? Contact us by clicking here.