Friday, September 24, 2021

Cannot find the object "xxx" because it does not exist or you do not have permissions.


Problem Statement


The requirement is to load data in SQL with two conditions to fulfill
1. If table doesn't already exist then create it and then load data in it
2. And every time before loading data, the already existing data in the table has to be removed/cleaned. 



Solution


There is a feature Auto Create Table available in CopyData activity (if your destination(sink) is database) which helps you to create the table automatically if it doesn't exist already.

And you have option to execute script before copying the data, for that you need to provide the query against Pre-copy script. As the requirement is to clean the table before loading data, following Truncate  query is used

   
             @{concat('Truncate Table ',item().TableName,'_Parquet')}



Below is the pipeline with Copy data activity

auto create table and pre copy script

I created a pipeline and clicked on  validate and No errors were found.


Error/Issue



However when I triggered the pipeline for testing, got below an error

Cannot find the object "xxx" because it does not exist or you do not have permissions.


                   


Why it happened



ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Cannot find the object "Customer_Parquet" because it does not exist or you do not have permissions.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Cannot find the object "Customer_Parquet" because it does not exist or you do not have permissions.,Source=.Net SqlClient Data Provider,SqlErrorNumber=4701,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=4701,State=1,Message=Cannot find the object "Customer_Parquet" because it does not exist or you do not have permissions.,},],'


 The error message is pointing to couple of possible reasons

    1. Not enough permission

    2. The table does not exist

For first possible reason checked the user permission (the one used in Linked service) - all was good and also Test connection on linked service was successful.

The second reason was true here, there was no table.

But, we have selected Auto create table option so did it not worked?

No it didn't reached to that step, the error was thrown by the step before it, which we selected by using Pre-copy script option. Where we are truncating the table.

As this was the first run, thus there was no Table and when truncate query was executed the object(Table) was not found.


What to do


Couple of ways to handle such scenarios

1. Have all the tables created already(remove auto create table) and then use only Pre-copy script to clean table before loading data

But here the requirement was to create tables automatically, so below approach can be used

2. For the first pipeline run, remove the Pre-copy script. After pipeline is run once, tables would be created. Now add the Truncate query in it.


If you have any better way or suggestion to handle such scenario, do comment !!!





Learn More about some more Azure Data Factory errors


Thursday, September 23, 2021

The function 'length' expects its parameter to be an array or a string. The provided value is of type 'Object'.


Error/Issue:


For the last post,  Getting started with ADF - Loading data in SQL Tables from multiple parquet files dynamically , I created a pipeline and clicked on  validate and No errors were found.


pipeline validated



However when I triggered the pipeline for testing, got below an error

The function 'length' expects its parameter to be an array or a string. The provided value is of type 'Object'.


                       


Why it happened


The function 'length' expects its parameter to be an array or a string. The provided value is of type 'Object'.


It was ForEach activity which gave us the error, and as the error says Object was given as input to it

incorrect input to for each loop

Wait.. we want output of lookup activity to be provided as input to For Each so that we can iterate through it.

But why it's  not allowing/accepting it?

How ForEach Loop works in Azure Data Factory



It is one of the control flow activity which can be used in a scenario where you have to iterate through some collection, list, array etc

And you perform or carry out same action/task for each iteration by using some value from the current item.

In ForEach whatever input (array/list/collection) you provide to it, is referenced using term called as items and the records in it as item.

Thus it expects its input parameter to be an array or a string.

And what I had provided - @activity('ConfigLookup').output

Let's see what does @activity('ConfigLookup').output contains

output of lookup activity



Output is a JSON Object having result of the activity and all the metadata about the activity whereas Value is a JSON array inside Output having actual result of the lookup activity.



What to do


Firstly make a note, even validated pipeline can give you surprises  😏.

So, over here correct input to be provided to ForEach is  

                @activity('ConfigLookup').output.value


That's it... pipeline happily executed then after.




Learn More about some more Azure Data Factory errors



    Wednesday, September 22, 2021

    Getting started with ADF - Loading data in SQL Tables from multiple parquet files dynamically

    Introduction


    In last post - Getting started with ADF - Creating and Loading data in parquet file from SQL Tables dynamically we saw how to create parquet files from SQL Tables dynamically.

    In this post we will see the reverse scenario i.e. Loading data in SQL Tables from multiple parquet files dynamically 

    Also we will make use of Auto-Create option to create table automatically if it doesn't exist in the destination SQL database.


    Steps in Loading data in SQL Table from parquet file 


    If you are beginner then would ask you to go through - Getting Started with Azure Data Factory

    Let's create a pipeline which will load data from single parquet file to particular table
     

    Create source linked service


    To create linked service means creating a connection,  Click on Manage tab from the left pane.

    On the Linked services page, select +New to create a new linked service. 

    Search for storage and select Azure data lake storage Gen2 for the Source here i.e., place where parquet files will be kept

    linked service for ADLS gen2 with parquet format

    Give meaningful name, 

    select Integration runtime as AutoResolveIntegrationRuntime

    Four options are available as authentication method , we select Account Key                             

    inputs to new Linked Service
    You can choose From Azure Subscription and select from the list of available storage accounts, here we are manually giving the url of the storage account and the respective storage account key.

    Note :  Alternatively and recommended  way is to make use of Key Vault 


    Create Linked Service for destination 


    Now, create another linked service, here the destination is SQL database tables, so create a Connection string to this particular database.

    Linked Service for SQL Database



    For that you provide the Server address, Database Name and the credential. And finally click on Test Connection to confirm all ok.

    Using this linked services, ADF will connect to these services at runtime. 


    Next is  to tell ADF, what form of data to expect.

    Two datasets is to be created one for defining structure of data coming from the parquet files (input) and another for  SQL table which will be created and data would be loaded into it (output) 

    Create Dataset - Parquet Files


    Select Author tab from the left pane --> select the + (plus) button  and then select Dataset.

    Search for storage and select ADLS gen2, Parquet as format,  give the Name and select the linked service

    create parquet dataset

    Provide the file path and save it.


    Create Dataset - SQL 


    Search for SQL and select SQL Server, provide the Name and select the linked service, the one created for connecting to SQL


    Create SQL dataset



    Provide the Table name and save it.

    Create Pipeline to load data in SQL Table from Parquet File 


    Select the + (plus) button, and then select Pipeline.

    Drag and drop Copy Data activity from Activities-->Move & Transform section

    In the General panel under Properties, specify Copy data from Parquet to SQL for Name. 

    Provide the details of source (i.e. ADLS gen 2 folder location) and sink (i.e. which Table to insert data into).

    copy data parquet file to SQL Table



    That's it , basic pipeline is ready.

    So this pipeline will read data from inputfiles/invoice/Invoice.parquet file and insert into dbo.Invoice_Parquet table.

    This is great for single file, what if there are multiple files from which data is to be loaded in SQL table?

    You can say, we can use same pipeline - by just replacing the folder path and file names in source settings and table name in sink settings, yes that will work but there will be manual intervention required.

    And what if there are hundred's and thousand's of different object's parquet file? 


    How data can be loaded dynamically in SQL table from parquet files using Azure data factory pipeline?


    In order to  load data in those  dynamically, we will take help of configuration table where we will store the required details.

    This configurations can be referred at runtime by Pipeline with the help of LookUp activity, the output of this activity can be used to dynamically provide the settings value (through the parameters - which we will add in the datasets) , thus no hardcoding required in the pipeline.

    So same pipeline can be used for all the requirement where data is to be loaded in SQL table, just an entry in the configuration table is required.

    Let's see below, how to do it...

    Steps in creating pipeline - Load data in SQL Table dynamically from multiple different parquet files 


    We will use same Linked services and Datasets created above with or without some modification

    Modify Source and Destination Linked Service


    No changes required


    Modify Source Dataset


    We will remove hardcoded folder name and filePath (no static data).

    And add parameters to dataset

                        container
                        folder
                        filepath

    this will help us in achieving the dynamic selections of the parquet files.


    modified parquet dataset




    In connection tab add following against File Path

    @dataset().container/@dataset().folder/@dataset().filepath


    Modify Destination Dataset


    We will Remove hardcoded Table name (no static data).

    And add a parameter called TableName and in connection  tab provide following against Table -
                                 
                                    @dataset().TableName



    Adding parameter in SQL Dataset


    We will make use of parameter, this will help us in achieving the dynamic selection of Table. 



    Configuration Table 


    Using this table we will have some basic config information like the file path of parquet file, the table name, flag to decide whether  it is to be processed or not etc. (more columns can be added as per the need).

    Configuration table for ADF pipeline


    This table will be referred at runtime and based on results from it, further processing will be done.

    In future

    1. if more objects are to be added - we make entry in config table
    2. if some objects are not to be processed - we set TobeProcessed flag to false

    Thus we can setup pipeline once and provide the config details dynamically to it, without making any further change in pipeline.

    Designing the Pipeline


    The purpose of pipeline is to read the parquet files and load the data in the respective SQL Table 

    i.   Lookup Activity

    config lookup activity

    The fist step where we get the details of the location of the parquet files and which all tables are to be loaded .

    • Place a lookup activity , provide a name in General tab.
    • Under Settings tab - select the dataset as DS_SQLServer , provide the table name as config (the one created in above step), select user entry as Query and provide below against the Query
                        select * from config where TobeProcessed = 1

    Here basically we are fetching details of only those objects which we are interested(the ones having TobeProcessed flag set to true)

    So based on number of objects returned, we need to perform those number(for each) of copy activity, so in next step add ForEach 


    ii.  For each activity

    ForEach activity

    ForEach works on array, it's input. Here it is termed as Items , provide following against it

                     @activity('ConfigLookup').output.value

    Basically we are giving it the output of  ConfigLookup activity, so it will iterate through it . And in each iteration we need to perform the act of creating parquet file and for that we add CopyData activity in ForEach.

    iii. CopyData Activity

    • Select Copy data activity , give a meaningful name.
    • On Source tab


     i. Select dataset as DS_Parquet, as soon as you do it - you are asked to provide value against parameters


    Here we are using Wildcard path, so that all the .parquet files which is there in the provided path should be processed

                    copy activity parquet to sql            
    • On Sink tab
              i. Select dataset as DS_SQLServer, as soon as you do it - you are asked to provide value against parameter TableName, provide below       
                           @concat(item().TableName,'_Parquet') 

             ii. In Table option select Auto create table  -  this tells adf engine to check whether the table exists already in destination database.
    If no table found, it automatically creates Table and then loads data in it.



    Where did item came into picture?

    Here item is current object/record of the ForEach items array. 

    In previous step, we  had assigned output of lookup activity to ForEach's Items thus if lookup activity returned 3 records then Foreach items will have 3 records and it is referred as item in it.

    Thus you provide the value which is in the current iteration of ForEach loop which ultimately is coming from config table  


    That's it, pipeline is ready to use.


    Conclusion


    We walkthrough the steps of creating a pipeline to load data in SQL from a parquet file using Azure data factory pipeline . 

    And in a scenario where there is need to insert data into many tables from multiple parquet files, same pipeline can be utilized with the help of configuration table .

    Thus the pipeline remains untouched and whatever details are to be added or removed, is done in configuration table.

    It is a design pattern which is very commonly used to make the pipeline more dynamic and to avoid hard coding and reducing tight coupling.


    If you have some better idea or any suggestion/question, do post in comment !!



    Thursday, September 16, 2021

    Getting started with ADF - Creating and Loading data in parquet file from SQL Tables dynamically

    Introduction



    There are many file formats supported by Azure Data factory like
    • Avro format
    • Binary format
    • Delimited text format
    • Excel format
    • JSON format
    • ORC format
    • Parquet format
    • XML format
    Each file format has some pros and cons and  depending upon the requirement and the feature offering from the file formats we decide to go with that particular format.

    One of the most used format in data engineering is parquet file, and here we will see how to create a parquet file from the data coming from a SQL Table and multiple parquet files from SQL Tables dynamically.


    What is Parquet file format



    Parquet is structured, column-oriented (also called columnar storage), compressed, binary file format.


    It is opensource, and offers great data compression(reducing the storage requirement) and better performance (less disk I/O as only the required column is read).

    It contains metadata about the data it contains(stored at the end of the file)


    Note:  Binary files are a computer-readable form of storing data, it is not a text file rather it stores data in the form of bytes.

    Create parquet file from SQL Table data


    If you are beginner then would ask you to go through - Getting Started with Azure Data Factory
     
    To do this we simply make use of copyData activity in a pipeline and provide the details of source (i.e. which Table to refer) and sink (i.e. ADLS Gen 2 folder location).

    Copy data SQL Table To Parquet

    Details about creating Linked Service and DataSet is provided below in Steps to create pipeline section

    This is great for single Table, what if there are multiple tables from which parquet file is to be created?

    You can say, we can use same pipeline - by just replacing the table name, yes that will work but there will be manual intervention required.

    And what if there are hundred's and thousand's of table? 


    How parquet files can be created dynamically using Azure data factory pipeline?


    In order to create parquet files dynamically, we will take help of configuration table where we will store the required details.

    This configurations can be referred at runtime by Pipeline with the help of LookUp activity, the output of this activity can be used to dynamically provide the settings value, thus no hardcoding required in the pipeline.

    So same pipeline can be used for all the requirement where parquet file is to be created, just an entry in the configuration table is required.

    Let's see below, how to do it...

    Steps in creating pipeline - Create parquet file from SQL Table data dynamically


    Source and Destination connection - Linked Service


    To create a connection,  Click on Manage tab from the left pane.

    On the Linked services page, select +New to create a new linked service. Provide name and select the integration runtime  

    Here the source is SQL database tables, so create a Connection string to this particular database.

    Linked Service for SQL Database



    For that you provide the Server address, Database Name and the credential. And finally click on Test Connection to confirm all ok.

    Note :  Alternatively and recommended  way is to make use of Key Vault 

    Now, create another linked service for the destination here i.e., for Azure data lake storage

    Linked Service for Azure Data lake storage


    Using this linked service, ADF will connect to these services at runtime. Next is  to tell ADF, what form of data to expect.


    Define the structure of the data - Datasets


    Two datasets is to be created one for defining structure of data coming from SQL table(input) and another for the parquet file which will be creating (output) 

    Select Author tab from the left pane --> select the + (plus) button  and then select Dataset.

    New SQL Dataset

    Search for SQL and select SQL Server, provide the Name and select the linked service, the one created for connecting to SQL

    Add a parameter called TableName and in connection  tab provide following against Table - @dataset().TableName

    Adding parameter in SQL Dataset


    We will make use of parameter, this will help us in achieving the dynamic selection of Table. 


    ADF parquet dataset


    Now search for storage and select ADLS gen2, Parquet as format,  give the Name and select the linked service
    Parameters in parquet dataset

    Next we will add parameter to dataset,

    container
    folder
    filepath

    this will help us in achieving the dynamic creation of parquet file.

    In connection tab add following against File Path

    @dataset().container/@dataset().folder/@dataset().filepath




    Configuration Table 


    Using this table we will have some basic config information like the file path of parquet file, the table name, flag to decide whether  it is to be processed or not etc. (more columns can be added as per the need).

    Configuration table for ADF pipeline


    This table will be referred at runtime and based on results from it, further processing will be done.


    Designing the Pipeline


    The purpose of pipeline is to get data from SQL Table and create a parquet file on ADLS

    i.   Lookup Activity

    config lookup activity

    The fist step where we get the details of which all tables to get the data from and create a parquet file out of it.

    • Place a lookup activity , provide a name in General tab.
    • Under Settings tab - select the dataset as DS_SQLServer , provide the table name as config (the one created in above step), select user entry as Query and provide below against the Query
                        select * from config where TobeProcessed = 1

    Here basically we are fetching details of only those objects which we are interested(the ones having TobeProcessed flag set to true)

    So based on number of objects returned, we need to perform those number(for each) of copy activity, so in next step add ForEach 


    ii.  For each activity

    ForEach activity

    ForEach works on array, it's input. Here it is termed as Items , provide following against it

                     @activity('ConfigLookup').output.value

    Basically we are giving it the output of  ConfigLookup activity, so it will iterate through it . And in each iteration we need to perform the act of creating parquet file and for that we add CopyData activity in ForEach.

    iii. CopyData Activity

    CopyData activity - SQL table to parquet



    • Select Copy data activity , give a meaningful name.
    • On Source tab
    i. Select dataset as DS_SQLServer, as soon as you do it - you are asked to provide value against parameter TableName, provide below       

                                                  @item().TableName      


    ii. Select Query against Use Query

          iii. Against Query - provide the query, here we want all data from the current Table
              
                               @concat('select * from ',item().TableName)


    Question might come in your mind, where did item came into picture?


    Here item is current object/record of the ForEach items array. 

    In previous step, we  had assigned output of lookup activity to ForEach's Items thus if lookup activity returned 3 records then Foreach items will have 3 records and it is referred as item in it.

    Thus you provide the value which is in the current iteration of ForEach loop which ultimately is coming from config table  
                                
    • On Sink tab
               i. Select dataset as DS_Parquet, as soon as you do it - you are asked to provide value against parameters


    container
    string
    folder
    @concat('/',item().FilePath)
    string
    filepath
    @concat(item().Object_Name,'.parquet')
    That's it, pipeline is ready to use.


    Conclusion


    We got a brief about a parquet file and how it can be created using Azure data factory pipeline . And in a scenario where there is need to create multiple parquet files, same pipeline can be leveraged with the help of configuration table .

    Thus the pipeline remains untouched and whatever addition or subtraction is to be done, is done in configuration table.

    It is a design pattern which is very commonly used to make the pipeline more dynamic and to avoid hard coding and reducing tight coupling.


    If you have some better idea or any suggestion/question, do post in comment !!