Introducing Snowflake
Snowflake is a cloud based data warehouse built on top of the Amazon Web Services (AWS) cloud infrastructure and is a true SaaS offering (Software as a service (Saas) – It is a software distribution model in which a third-party provider hosts applications and makes them available to customers over the Internet).There is no hardware or a software (virtual or physical) for you to select, install, configure, or manage . All ongoing maintenance, management and tuning is handled by Snowflake.
Snowflake is a full relational database management system (RDBMS) built for the cloud. Snowflake uses ACID compliant and support standard SQL. All this description makes snowflake sounds like a database. Now you must be wondering, why snowflake is called a Data warehouse and not a Database? Let’s take a closer look just to be sure.
What is a database?
A database is a collection of information organized to be easily accessed, managed and updated. Here are some of the defining characteristics of database:
- Designed for rapid storage and retrieval of small sets of current data records within the enterprise.
- Data is organized in tables and columns, allowing users access via structured query language (SQL).
- Handles quick, real-time activity such as entering a products name, recording a sale and recording all accounting activity of that sale.
- Works well for basic operational reporting of a limited number of records. Analytic reporting is relegated to simple, static reports.
What is a data warehouse?
Some of the defining characteristics of a data warehouse are:
- A database designed to store and process large volume of current and historical data collected from multiple sources inside and outside the enterprise for deep analysis.
- Organizes data into tables and columns, and allows users access via SQL.
- Optimized for loading, integrating and analyzing very large amounts of data.
- Designed to support descriptive, diagnostic, predictive and prescriptive analytic workloads.
Snowflake includes the characteristics of a database and also embodies all of the defining characteristics of a data warehouse.
Snowflake specifically designed for data warehousing and high speed analytic processing. Snowflake was built for the cloud to optimize loading, processing and query performance for very large volume of data. Therefore Snowflake is a Data warehouse.
Why Snowflake?
For data driven organizations, many of the existing database systems and platforms have become too cumbersome to set up and maintain. Let’s have a look at this scenario.
You are busy feeding data into your system memory and suddenly you see a warning message, “insufficient system memory” or “insufficient storage”.
The drawbacks of traditional database systems and other available data warehouses are:
- Scalability limitations.
- They are costly and requires license and management cost.
- Backup costs, disaster recovery and security are all the responsibility of a customer.
With Snowflake, there is no trade-off between scalability and productivity.
Snowflake handles both structured and semi-structured data. Additionally, Snowflake’s automatic maintenance and database administration means huge savings over these products.
Snowflake supports NoSQL databases (Cassandra and mongoDB), semi-structured data ( JSON, XML) . As Snowflake loads semi-structured data, metadata is extracted and encrypted and made available for querying just like your structured data. Yes, you can query your JSON data with SQL and you can join it to other structured data. You can perform all your favourite SQL functions on structured and semi-structured data in the same query.
Snowflake should be considered as a replacement for open-source projects like Hadoop. Snowflake is cloud based and has a simpler environment than Hadoop with more automated management. Snowflake lets you focus more on using your data rather than managing the system your data lives in.
Snowflake provides high level performance, mind-blowing levels of concurrency and the utmost simplicity. Snowflake’s unique architecture allows you to start small and grow large with a few simple clicks.
Snowflake’s awesome features are:
- Dynamic caching,
- Zero-copy data cloning
- Diverse data and application integration, continuous data protection and security.
Snowflake clearly needs to be considered as a solution for any business working with “big data.”
Snowflake Architecture
These 3 main components make up the Snowflake data warehouse.
The 3 main components are :
Database Storage — The actual file system in Snowflake is backed by S3 in Snowflake’s account. All data is encrypted, compressed, and distributed to optimize performance. The Data is geo-redundant in Amazon S3 and provides excellent data durability and availability.
Query Processing — Snowflake provides the ability to create “Virtual Warehouses”. Virtual Warehouses used to load data, or run queries and are capable of doing both of these tasks concurrently. These Virtual Warehouses can be scaled up or down on demand and can be paused when not in use to reduce the spend on compute.
Cloud Services — Coordinates and handles all other services in Snowflake including sessions, authentication, SQL compilation, encryption, etc.
By design, each one of these 3 layers can be independently scaled and are redundant.
Connecting to Snowflake
Snowflake makes connecting to databases very easy .One method is to use any of the supported ODBC drivers for Snowflake. Additionally, SnowSQL CLI can be leveraged or use the web based worksheet within your Snowflake account.
Snowflake defines a virtual warehouse (just another term for data warehouse. Virtual warehouses often collect data from a wide variety of sources) as a cluster of compute resources. This warehouse provides all the required resources, such as CPU, memory, and temporary storage, to perform operations in a Snowflake session.
The first step after logging into a Snowflake account is to create a virtual warehouse. You can create two warehouses, the first for ETL (The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETL, which stands for extraction, transformation, and loading. Note that ETL refers to a broad process, and not three well-defined steps) and the second for queries.
How easy is Snowflake?
In contrast to traditional database systems Snowflake requires zero management from the end user. There are no knobs to turn, indexes to tune, partitions to build – Snowflake handles all of this for you automatically. This allows users to simply load their data and run queries.
Snowflake has an impressive dynamic caching system: All query results are stored for 24 hours (90 days for enterprise customers) or until the underlying data changes. If a new query matches an entry in the results cache, then the matching result is returned immediately without using compute.
Snowflake can easily hold all your data or you can say data of any flavour you like. All the different types of data that your business generates can be stored in one instantly accessible place.
Scalability at its peak
For all users and applications, snowflake is available at any scale. Snowflake brilliantly separates storage, compute and metadata management, trillions of rows can be sliced up with ease by concurrent users. Storage, compute and metadata service can be scaled up and down independently and immediately and as necessary.
Snowflake Pricing
Snowflake’s pricing is simple: you only pay for what you use. Snowflake is fully elastic, gone are the days of worrying about shifting data in and out of cold storage. Data is stored at a price that is comparable to cloud storage systems like AWS S3. You can store petabytes of data, pay for just the compressed volume of that data and then pay for the compute portion of the system when you need it. If you don’t need the data for a week, you only pay for the storage.