SnowFlake Interview Questions and Answers

 

This post intends to cover the answers to the questions which a Snowflake developer can face at an Interview i.e.  Developer Interview Questions and Answers on Snowflake


1. How Snowflake History Page works

- History page actually fetches data from one of the view from Information_schema i.e. Query_History based on the filters applied.

2. Where can we find details of the queries executed in Snowflake?
- It's the History Tab

3. Can we query Query view in Snowflake?
- You can certainly query directly Query_History and get the desired details or you can make use of the dedicated Information Schema table functions(QUERY_HISTORY_BY_* ) associated with it.

4. Can I have multiple schedules in a single Task?
- No. It is not supported. 

To handle such scenarios, you can create multiple Task with different schedules.

5. I created a task with schedule to run every 15 mins, but it didn't run?
- It is because when you create a task it is not enabled just the task object gets added and it is in suspended state.

So in order to bring it in effect you have to enable it and that we do by using Alter command and set task to resume

                     ALTER TASK taskname RESUME;

6. What are the limitations of tasks in snaowflake?
- Snowflake tasks have the following limitations:

Tasks cannot be run on demand. They can only be scheduled using a cron expression or triggered by another task.

Tasks cannot call UDFs containing Java or Python code, or stored procedures written in Scala. Nor can you call such a UDF.

Tasks cannot use primary and foreign key constraints, indexes, or partitions.

If you don't carefully monitor your storage size and usage, the task can have hidden costs. 


7. What is the maximum number of tasks which can be managed in the task module?
- The maximum number of tasks that can be managed by the Tasks module depends on:

A task can have up to 100 parent tasks and 100 child tasks.

A directed acyclic graph (DAG) of tasks is limited to a maximum of 1000 total tasks (including the root task).

The TASK_HISTORY function returns up to 10,000 rows of task history determined by the RESULT_LIMIT argument value. 

8.How is snowflake different from other data warehouse?
- Snowflake is a cloud-based data warehouse that offers many advantages over other data warehouse solutions. Key Snowflake features include:

It is a Software-as-a-Service platform that can be deployed on various cloud providers such as AWS, Azure, and Google Cloud Platform.

This allows users to create and scale multiple virtual warehouses that can run independent workloads on the same data without contention or performance degradation.

It features a patented architecture that separates the storage and compute layers, allowing users to pay only for the resources they use and customize as needed.

It supports structured and semi-structured data, as well as SQL and other languages ​​for data analysis and manipulation. 

9.What do you mean by zero-copy cloning in Snowflake?
- Zero Copy cloning is a Snowflake feature that allows users to create a copy of a table, schema, or database without duplicating the underlying data. 

Captures a snapshot of the data in the source object and provides it to the clone object.

There is no additional cost for cloning as the same tier of storage is shared between source and clone.

Makes the clone writable and independent from the source and this prevents changes made to one object from being reflected in the other. 

10. What is the difference between fail-safe and Time Travel in Snowflake?
- Time Travel and Fail-safe are two features of Snowflake that provide continuous data protection and availability,  the difference between them is:

Time Travel allows users to query, clone, and restore historical data in tables, schemas, and databases for up to 90 days. Users can access Time Travel through SQL commands or the web interface.

Fail-safe provides a 7-day period during which historical data may be recoverable by Snowflake in the event of a system failure or other disaster. Users cannot access Fail-safe directly, they need to contact Snowflake support for data recovery.

11.How many types of cache are there in Snowflake?
- Snowflake has three types of caches.

Metadata cache:

This cache contains information and statistics about various objects such as tables, views, and hosted files. The cache is always up to date and never empty. It resides in Snowflake's service layer.

Query result cache:

This cache stores the results of all queries executed in the last 24 hours. These results are available in all virtual warehouses unless the underlying data has changed or the query contains functions that must be evaluated at runtime. It also exists in Snowflake's service layer.

Local cache or warehouse cache:

This cache stores data used in SQL queries on Snowflake's compute tier. It is associated with a specific instance of a virtual warehouse and is deleted when the warehouse is paused. It can speed up queries that repeatedly access the same data. 

12. How many roles are there in Snowflake?
- A Snowflake role is an entity that can grant and revoke privileges on securable objects such as databases, tables, and schemas. Users are assigned roles that enable them to perform actions required for the organization's business functions.

Snowflake has two types of roles.

System-defined roles:

These are predefined roles provided by Snowflake for common administrative and operational tasks. These include ACCOUNTADMIN, SECURITYADMIN, USERADMIN, SYSADMIN, PUBLIC, etc. System-defined roles cannot be modified or deleted.

Custom role:

These are custom roles that you can create and customize for your specific needs and use cases. Custom roles can inherit permissions from other roles, including system-defined roles. Custom roles can be created, modified, and deleted by users with appropriate privileges. 

13. Why is Snowflake so fast?
- Snowflake is fast because it leverages the power and scalability of the cloud to provide powerful data warehousing solutions. Some of the reasons Snowflake is fast are:

It features a unique architecture that separates the storage and compute layers, allowing users to scale independently and pay only for the resources they use.

Supports multiple virtual warehouses that can run independent workloads concurrently against the same data without contention or performance degradation.

There are different levels of caching that improve query performance by reusing data and results from previous queries.

You'll see some performance improvements that optimize query execution, speeding up queries, removing joins, optimizing searches, and selecting top-K order.

Transparent and flexible pricing allows users to optimize the costs and benefits of their workloads. 

14.What is the SQL language in Snowflake?
- SQL stands for Structured Query Language and is a standardized format for querying and database management. 

Snowflake supports ANSI SQL, the most widely used standard version of SQL. This means Snowflake SQL supports all common operations i.e. select, insert, update, and delete data.

Snowflake also supports SQL extensions such as:

Snowflake Scripting is a SQL-like language that enables stored procedures, conditional statements, loops, and error handling.

Snowpark is a framework that enables data transformation and processing using Python, Java, or Scala. 

The Snowflake API is an interface that enables programmatic access to Snowflake functions and data. 

15. What are the ways to interact with Snowflake?
- There are many ways to interact with Snowflake, depending on your needs and preferences. 

The most common methods are:

SnowSQL is a command line interface (CLI) tool that you can use to execute SQL statements and perform various tasks in Snowflake.

Snowsight is a web-based user interface (UI) that provides a modern and intuitive way to explore, query, visualize, and share data in Snowflake.

Snowflake Connector for Python. A library that allows you to connect to Snowflake and perform data manipulation using Python code.

Snowflake ODBC driver. Software that allows ODBC-based applications such as R, Excel, and Tableau to connect to Snowflake and access data.

The Snowflake API is an interface that allows you to programmatically manipulate Snowflake functions and data using a variety of languages ​​and frameworks. 

16.What is unique about Snowflake?
- Snowflake is a cloud-based data platform that provides a variety of features and services for storing, processing, and analyzing data. 

Unique aspects of Snowflake include:

With support for multiple cloud platforms and regions, customers can choose their preferred cloud provider and location.

It features a multi-cluster shared data architecture that separates compute and storage resources, enabling scalability, performance, and parallelism for diverse workloads.

Provides continuous data protection and regulatory compliance for data security and governance.

Enables data sharing and collaboration between organizations and ecosystems through the Data Cloud and Snowflake Marketplace. 

17.What are the different types of files supported in Snowflake?
- Snowflake supports various file types for loading and unloading data. Supported file formats are:

CSV is a structured format that uses delimiters (such as commas and tabs) to separate values ​​in each row.

JSON is a semi-structured format for representing objects and arrays using key-value pairs.

AVRO is a semi-structured format that uses binary encodings and schema to store data.

ORC is a semi-structured format that uses columnar storage and a data storage schema.

PARQUET is a semi-structured format that uses columnar storage and a schema for storing data.

XML is a semi-structured format that uses tags and attributes to represent elements and values. 

18.What are the three Snowflake stage types?
- A Snowflake stage is a storage location used to load data from files into Snowflake tables and unload data from tables to files. 

There are three types of Snowflake tiers.

An internal stage that stores data files internally within Snowflake. Internal phases can be permanent or temporary.

An external stage pointing to a data file stored somewhere outside of Snowflake such as Amazon S3, Google Cloud Storage, or Microsoft Azure.

A directory table that stores a catalog of files hosted in cloud storage. Directory tables can be used in both internal and external stages.

19. Is Snowflake built for OLAP or OLTP?
- Snowflake was developed for OLAP (Online Analytical Processing) database systems, systems that perform multidimensional analysis of large amounts of data.

Snowflake supports a variety of features and services for data storage, processing, and analysis, including:Data warehouse, data lake, data sharing, data exchange, data cloud.

Depending on your use case, you can also use Snowflake for online transaction processing (OLTP) purposes. OLTP is a system that performs high volume transactions in real time or near real time. 

Snowflake supports some OLTP features such as ACID transactions, concurrency control, row-level locking, and stored procedures. However, Snowflake is not optimized for OLTP workloads and may not offer the same performance and scalability as dedicated OLTP systems. 

20. What security terms does Snowflake use?
Snowflake uses various security terms to describe its capabilities and practices that protect the confidentiality, integrity, availability, and security of its Services and Customer Data.

Some of the security terms used in Snowflake are:

Encryption uses a private key to transform data into an unreadable form. Snowflake uses industry-standard algorithms and protocols to provide end-to-end encryption for data in transit and at rest.

Network security is a set of measures designed to prevent unauthorized access to or from a network. Snowflake supports network policies, private connections, firewall rules, and proxy settings to secure network access to Snowflake.

Authentication is the process of confirming the identity of a user or system. Snowflake supports various authentication methods including passwords, federated authentication, multi-factor authentication, key pair authentication, and OAuth.

Authorization is the process of granting or denying access to resources based on predefined rules. Snowflake supports various authorization mechanisms such as roles, privileges, grants, ownership, and secure views.

Data masking is the process of hiding sensitive data or replacing it with fictitious data. Snowflake supports dynamic data masking, applying masking policies at query time based on user role. 

Post a Comment

If you have any suggestions or questions or want to share something then please drop a comment

Previous Post Next Post