Wednesday, September 1, 2021

How to fetch history data from Snowflake without using UI | Programmatically get the queries executed on Snowflake

Introduction


Whenever we need to run some queries in Snowflake web UI , we use worksheet(query editor) where we select the warehouse(compute) , database and the schema (optional). And when we run the query, the output is displayed in the results tab below.


snowflake results tab


UI is mostly used when we are doing development or testing out something, but once we are through with it and deployed all the objects, scripts/queries are run/executed without involving UI.

Irrespective of how the query was executed, after execution of each query (succeeded or failed), it get's an identifier for it i.e. unique query ID. This ID can be used to get all the details about that particular query.

Where can we find details of the query? It's the History Tab


Snowflake History Page

 

History page is an interface provided by Snowflake UI to see the details of all the queries executed till date (historical listing of queries)  like result of the query, time it took to execute(duration), number of rows returned, the caller of the query(client info) , the query text etc.

Columns in Snowflake history page


Note: History has details of all queries executed in the last 14 days only, anything older than that is purged.

available filters in history table



As can be seen in above image, there is a provision to filter the history results based on criteria. 

You can have multiple filter criteria added, just click on plus sign and define the filter. In below example - you want to see only those queries which are specific to particular user, and status is failed.

adding multiple filters


Note : Applying filter doesn't change the returned columns, columns are fixed.


How Snowflake History Page works


Whenever you create a Database in Snowflake, snowflake internally(automatically) creates a Information_Schema 

i. Which has some views(18), to store the metadata about the schemas, tables and databases you have created in Snowflake and
ii. functions to get details from Information_schema views.


Note: Every database in Snowflake has dedicated Information schema created against it  automatically (and it's read only, nothing can be edited/deleted there)

History page actually fetches data from one of the view from Information_schema i.e. Query_History based on the filters applied.


Fetch history data from Snowflake without using UI 


If you see, the History tab is quite useful for basic monitoring and viewing the query history using the UI but what if we are interested in getting more details (not all columns from Query_History view are shown in History tab).

You can certainly query directly Query_History and get the desired details or you can make use of the dedicated Information Schema table functions(QUERY_HISTORY_BY_* ) associated with it.

So based on dimensions (time range, session, user, warehouse, etc.) you can select the function

  • QUERY_HISTORY_BY_SESSION
  • QUERY_HISTORY_BY_USER
  • QUERY_HISTORY_BY_WAREHOUSE 


See Functions for more  details.

e.g., Say if we are only interested in getting Query_Text, Error_Code, Error_Description 


SELECT QUERY_TEXT, ERROR_MESSAGE,ERROR_CODE
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_SESSION(RESULT_LIMIT => 10))


Like wise based on your need you can create queries, you can also query on another views or join multiple views

e.g. say if we want details of Task which got executed - so here we can query on Task_History and Query_History 


select  t.RUN_ID AS TASK_RUN_ID, t.QUERY_ID, t.STATE AS TASK_STATE, t.ERROR_CODE AS TASK_ERROR_CODE, t.ERROR_MESSAGE AS TASK_ERROR_MESSAGE ,t.QUERY_START_TIME AS TASK_START_TIME,t.COMPLETED_TIME AS TASK_COMPLETED_TIME  ,q.EXECUTION_STATUS AS QUERY_STATUS, q.ERROR_CODE AS QUERY_ERROR_CODE, q.ERROR_MESSAGE AS QUERY_ERROR_MESSAGE,q.SESSION_ID AS Session_ID ,q.START_TIME AS QUERY_START_TIME, q.END_TIME AS QUERY_END_TIME, q.ROWS_PRODUCED,q.EXECUTION_TIME AS QUERY_EXECUTION_TIME ,q.WAREHOUSE_NAME, q.DATABASE_NAME, q.SCHEMA_NAME, q.QUERY_TYPE, q.TOTAL_ELAPSED_TIME AS QUERY_TOTAL_TIME 
from table(information_schema.task_history(RESULT_LIMIT=>10000, scheduled_time_range_start=>dateadd('hour',-1,current_timestamp())))t 
LEFT JOIN table(information_schema.QUERY_HISTORY_BY_WAREHOUSE(RESULT_LIMIT =>10000, WAREHOUSE_NAME => 'WH_DEV')) q ON q.QUERY_ID = t.QUERY_ID
WHERE t.QUERY_ID IS NOT NULL AND t.STATE != 'EXECUTING'




So here we are fetching details of all tasks which has executed and ran on WH_DEV warehouse.



Conclusion


We tried to get an understanding on how the queries which are executed on Snowflake can be monitored. How to get details about them and utilize them as per need.


If there is any approach you are aware of, feel free to share in comments. 




No comments:

Post a Comment

If you have any suggestions or questions or want to share something then please drop a comment