23.Copy data from multiple files into multiple tables | mapping table SQL | bulk
Summary
TLDRThis tutorial video guides viewers on copying data from multiple files into various SQL tables using Azure Data Factory. It addresses limitations such as the inability to manually map or use the upsert option. The video demonstrates creating datasets for source files and destination tables, parameterizing table names, and utilizing a lookup activity to read from a file mapping master table. It also covers setting up a for-each loop for dynamic file processing and concludes with a debug test to ensure successful data transfer, encouraging viewers to access provided resources for further practice.
Takeaways
- π The video demonstrates how to copy data from multiple files into multiple SQL tables using Azure Data Factory.
- π There is a limitation in manual mapping, as the same copy activity is used for different schemas.
- π Ensure that the source file has the same schema as the destination table to avoid issues.
- π The 'upsert' option cannot be used in this copy activity scenario.
- π Files and directories in Data Lake are read dynamically, with the folder structure being organized by Alpha, Beta, and Gamma schemas.
- π© An 'active' flag in the mapping table allows control over which tables are processed on specific days.
- π A single data set for dynamic files and parameterized SQL table names is created, allowing dynamic file and table mapping.
- π A 'for each' loop is used to iterate over different items in the value array, dynamically copying data based on the source and destination mapping.
- π The source and destination datasets are parameterized to handle dynamic table names and file paths for scalability.
- β The copy activity runs for each table (Alpha, Beta, and Gamma) and copies the respective rows into the SQL tables, confirming successful data transfer.
Q & A
What is the main goal of the video?
-The main goal of the video is to demonstrate how to copy data from multiple source files into multiple SQL tables using Azure Data Factory, specifically focusing on handling different schemas for the destination tables.
What limitation is highlighted at the beginning of the video?
-The presenter highlights a limitation that manual mapping cannot be done when copying data from different schemas using the same copy activity. Additionally, an additional column cannot be added within the copy activity for this process.
How does the presenter suggest handling scenarios where data should not be copied into a particular table?
-The presenter suggests using an 'active' flag in the mapping table. If the active flag is set to 0, the corresponding table will not be processed, allowing flexibility for excluding tables on certain days without having to delete them.
What is the structure of the source files and where are they stored?
-The source files are stored in a Data Lake in a structured folder system. Each table has its own folder, and under each folder, the corresponding source files are located. There is a daily folder structure, and the file paths are mapped dynamically.
What kind of data set does the presenter create for the destination SQL tables?
-The presenter creates a single data set for all the destination tables (Alpha, Beta, and Gamma) and parameterizes the table name so that it can dynamically change during execution.
Why is a wildcard option used for the source files in the copy activity?
-A wildcard option is used because the file names have dynamic suffixes, such as dates. The wildcard allows the system to pick up files with a specific prefix and any variable suffix.
How is the dynamic content handled for the table names in the SQL destination?
-The table name in the SQL destination is parameterized and passed dynamically using the current item in the for-each loop. The script extracts the specific table name from the lookup activity and uses it during the copy process.
Why does the presenter advise against importing schemas for the destination tables?
-The presenter advises against importing schemas because the destination tables have different schemas, and importing a fixed schema could cause conflicts when copying data to different tables.
What task is used to read the file mapping master table, and how is it configured?
-A 'lookup' task is used to read the file mapping master table. The task is configured to return all rows with the 'active' flag set to 1, ensuring that only the relevant data is processed.
How does the presenter test the process after setting up the pipeline?
-The presenter tests the pipeline using the 'debug' feature in Azure Data Factory. They check the output to verify that the copy activity runs three times (once for each table) and confirm the number of rows copied for each table.
Outlines
π Introduction to Copying Data into SQL Tables
The video begins by introducing the process of copying data from multiple files into multiple SQL tables using the same copy activity. The speaker highlights limitations such as the inability to perform manual mapping or use the upsert option due to the uniformity in schema required for the source files and destination tables. The video then transitions into demonstrating the destination tables with varying schemas, emphasizing the need for matching source file schemas with their respective destination tables. A mapping table is introduced to correlate source files with their corresponding destination tables, including an 'active' flag to control data processing for specific tables on certain days.
π Setting Up Data Sets and Linked Services
The speaker proceeds to guide viewers on setting up data sets for the source files and destination tables within Azure Data Factory. A data set is created for a 'file mapping master' table to read from, and another for the source files in the data lake, noting the dynamic nature of file and folder paths. For the destination, a single data set is parameterized to handle different table names dynamically. The video also covers the creation of linked services to connect the data lake and SQL Server to the data factory. The speaker ensures to mention the exclusion of schema import for the source files and the use of a parameterized table name for the destination.
π Implementing a Pipeline for Data Copying
The video then delves into the creation of a new pipeline, starting with a lookup task to read from the file mapping master table. This task is crucial as it determines the source and destination details for the data copying process. The speaker uses a SQL query to filter rows based on the 'active' flag, ensuring only relevant data is processed. A 'For Each' loop is introduced to iterate over the items in the 'value array' obtained from the lookup task, setting the stage for adding a copy activity within this loop for each item.
π Configuring Copy Activity and Testing the Pipeline
Inside the 'For Each' loop, a copy activity is added to handle the data transfer from the source container to the SQL dataset. The speaker configures the source path and file name using dynamic content and a wildcard, ensuring that the correct files are selected based on the date suffix. The destination is set by parameterizing the table name, allowing for flexibility in copying data to different tables. The speaker emphasizes not importing any schema due to the varying schemas of the source files. After configuring the pipeline, a debug run is performed to test the flow, which successfully copies data to the respective tables as intended.
Mindmap
Keywords
π‘Copy Activity
π‘Schema
π‘Data Lake
π‘Azure Data Factory
π‘Linked Service
π‘Pipeline
π‘Wildcard
π‘Parameterization
π‘ForEach Loop
π‘Active Flag
Highlights
Introduction to copying data from multiple files into multiple SQL tables using the same copy activity.
Limitation of manual mapping due to the use of the same copy activity for different schemas.
Inability to add additional columns within the copy activity for different schemas.
Exclusion of the upsert option in the copy activity for different schemas.
Demonstration of copying data into three tables with different schemas.
Requirement for source files to have the same schema as the destination tables.
Explanation of a mapping table that lists the source files and their corresponding destination tables.
Use of an 'active' flag to control the copying of data to specific tables.
Organization of source files in the data lake with daily folders and specific paths for each table.
Creation of a linked service in Azure Data Factory for the data lake and SQL Server.
Creation of a data set for the file mapping master table in SQL Server.
Creation of a data set for the source container in the data lake without specifying file names.
Parameterization of the table name in the destination data set for dynamic table names.
Use of a for each loop to iterate over items in the value array from the lookup activity.
Setting up a copy activity within the for each loop for each table.
Use of wildcard and dynamic content for file paths and names in the copy activity.
Concatenation function to create file name patterns for the copy activity.
Execution of the pipeline and verification of the copy activity's success.
Access to SQL scripts, Excel files, and ARM templates for practice through a community link.
Transcripts
hello everyone in this video we are
going to see how to copy data from
multiple files into multiple SQL tables
if you are new to our Channel hit
subscribe your subscription will
motivate me to produce more video in
better quality before we proceed I just
want to highlight a small limitation
that you cannot do manual mapping
because we are using the same copy
activity to copy the datas of different
schema
and if you remember in our earlier video
we saw how we can add an additional
column within the copy activity which
you cannot do it over here
just have the same column name between
your source file as well as your
destination and another limitation is
that you cannot use the upset option
which is available in the copy activity
let me show the destination tables so
these are the three tables where we are
going to copy our data into it and these
are having different schema two of the
table are having the same schema and the
third table is having a different schema
I just want to show that we can able to
copy the tables of different schema with
the same copy activity itself
just make sure that the source file is
having the same schema as that of the
destination table
and these are the various source file I
have created a mapping table which will
list out
for which table what is the source file
and in which directory it is I have
already showed the schema for these
three table alpha beta and gamma forgot
about the last one that we are going to
ignore and similarly I have provided the
file name in these columns let me show
the files from my local itself
so these are the three files uh which is
going to be our source I will be
uploading this to data Lake
and these are the three tables and these
are the three files and for these three
these are the path in the data Lake
and I have created one more column which
is active which will say whether I need
to consider this row or not let's say
for example if I am keeping active as 0
for Row 2 then I shouldn't copy the data
for beta table so we should consider
whatever the table which is having
active status as 1. I have explained
with this active flag because in several
scenarios will be coming up like on a
particular day we don't want to process
datas into this particular table maybe
on some other day it may be required so
instead of deleting from the table
itself it is good to have a flag so that
you can update the flag based upon your
requirement now I will show how this
path are maintained in the data lake so
inside the data like I have only one
container which is for source and inside
that I have created a daily folder under
that we have separate folders like Alpha
Beta And inbound inside Alpha we have
files for Alpha and similarly for beta
as well but for gamma what I did is I
have created one more folder
which is inbound under that only we have
a separate folder for gamma and inside
that we have source file for gamma
you can ignore the fourth file which is
inactive
now let's jump to our Azure data Factory
in our Azure data Factory under main is
I have already created a linked service
to my data Lake as well as to the SQL
Server
if you don't have ID on linked service
watch my introduction video about ADF
now we need to create a data set for
this table since we are going to read it
we need to create this data set for this
table so create a new data set and
search for SQL mine is not Azure SQL
so I am selecting this this is my
personal server so I'm just selecting it
let me provide a name to the data set
and from the link to Service drop down
select the SQL link
and it will load whatever the tables
available inside the SQL Server so under
that I need a file mapping master
and I want to import the schema of the
file as well just click on OK
and now
just if you go to schema you will be
able to find the scheme of the
particular table over here
now let's create a data set to the data
Lake as well so we need to create a data
set for this container alone because the
files are going to be dynamic files and
folder are going to be dynamic we just
need to have a data set for this
container alone so in order to do it
just go here and click on new data set
and search for data Lake
click on continue and my input a file is
in CSV formatters so I am selecting it
let me provide a name to my data set
and from the drop down select the link
for the data Lake
and just browse here to select the
container so I'm selecting the container
alone and I am not going to select any
folders inside it because these folders
We are going to read dynamically from
here so I'm not going to select the
folder I'm just selecting the container
alone and
my first row is a header in all our
source file meaning like that column
name will be in the header of the file
so I am just selecting it and I'm not
going to import any schema so I am
selecting none from here
now we need to create a data set for our
destination tables which is alpha beta
and gamma
instead of creating a separate data set
to all these table we are going to have
a single data set and we will
parameterize the table name so here
search for S cable
and select it and click on continue
let me provide a name to the data set
and from the linked Service drop down I
am selecting the SQL link I am not going
to import any schema or I'm not even
going to select the table name
because those are going to be dynamic
completely
and here
we need to parameterize the table name
in order to do it go to parameters
and click on new
here provide the parameter name as table
name but while recording I gave us file
name by mistake but please provide the
parameter name as table name while you
are doing it click on connections so for
the table instead of setting from the
drop down we are going to pass it
dynamically so what we need to do is
just click on edit and here you will get
two boxes
the first one is for the schema and the
second one is the table name schema is
dbo which is the default schema in SQL
so I am providing as dbo
and this table name is going to be
dynamic so click on ADD Dynamic content
and select the parameter which we have
created
just click on it and click on OK
now
let us see whatever we have done so far
we have created a data set for our file
mapping master table and we have
imported the schema as well and for the
source file we didn't import any schema
and we didn't create any parameter we
just selected the source container and
for the destination we have created a
parameter and we have passed it under
the table name now let's publish
now let's jump to creating a new
pipeline from there only we are going to
start implementing just click on new
pipeline
the first task is to read from the file
mapping master table right then only we
will get to know what is the source and
what is the destination right so in
order to do it look for lookup task just
drag and drop
and here if you wish to change the name
of the particular task you can do so
from here I'm just renaming it
after that
just click on settings and select the
source data set which is our file Master
data set right which we have created
file mapping Master data set so selected
here uncheck this first only because we
want all the rows from the table so just
uncheck it and do we need to read all
the rows from the table no right we just
want whatever having flag as one right
so these are the data we need to read so
in order to do it so I'm just typing the
SQL query for the same
if we execute this we will get only
these three rows which is having the
column active as one
now just click on query you can use
store procedure as well but for time
being I am going with the query I am
going to paste the query which needs to
be executed
now
we will try to run this task alone
just run this debug in order to run it
it got completed
and here if you check the output
just let me copy this to your Notepad
let me paste it and here if you see
we have several information in our
output like count value one dot but all
we need is whatever the item inside this
value array this square bracket is there
right which means array so we need the
this item the first item represents
Alpha table second beta and the third
one is gamma so we need the item which
is in this value array
so we are going to write a for each Loop
for it because we are going to Loop each
of the items inside that value array so
let me drag and drop
under settings
just click on item and click on ADD
Dynamic content here we need to select
the value which we are passing in that
array so here you are able to see right
this particular part so here it is
showing value array right lookup value
array in the output it is particularly
fetching that value array so just click
on it
now what will happen is
each of the item inside this value array
will be looped into it
what we need to do is click on this edit
icon to add activity inside this forage
what we are going to do is inside this
forage we are going to add a copy
activity
just drag and drop and under Source
select the data set which you have
created for the source container just
select it and if you have provided the
file name as well we can leave this file
path in the data set option but we
didn't specify any file name right so we
need to go with wildcard option in the
data set we didn't provide any file name
so I'm going with wildcard in wildcard
if you see the container name is already
here but the path we are A2 provided and
path as I told you earlier we are
getting from this table right what we
need to do now is
we are already reading the output from
the lookup activity and we are iterating
into this Loop right so this value will
be already available in the output
let me add a dynamic content here and
for each already is having that value
but which column we want to read that we
need to specify so I have put dot let me
open the output RH will be getting this
highlighted value
but all we need is this particular path
column alone so just copy it and let me
go to Azure data Factory
just paste that particular column name
so what it will do what are the value
coming to this particular path it will
be applied over here which means the
directory is going to be this one
and in this text box we need to specify
the file name we don't have complete
file name all we have is file name
prefix and this is the value let me show
in the data Lake
let me go to Alpha Daily Alpha so if you
see only we have this prefix part we
don't have this rest of the part because
this will be changing every day right so
for that only we have configured only
the prefix part and we ignored this
suffix path
so here what we need to do is
click on the text box and then click on
ADD Dynamic content
and as we did earlier just select the
current item of for each and
here we need to provide this column name
so that we can access the prefix file
name
copy it
and Dot followed by the column name so
this value will be applied over here but
we have only the prefix part right
if I leave this as it is it won't able
to read the file because our file is
having suffix part as well right the
date part we need to say to Azure data
Factory to pick up a file which is
having this as per fix and some suffix
part is going to be there in order to do
it
just cut this part we'll be pasting it
later and go to function here look for
concat
so usually it will be under string
function if you are not able to search
it just go here under string function
and the first item is going to be concat
so paste it paste the item which we have
cut and remove this art part so usually
only one should be there for a nested
expression always leave the first ad to
be there and just remove unwanted at in
between
what concat will do is it will open the
strings which we provide in commas
let me provide a command and followed by
star in single quotes
what this expression will provide us
in the first Loop it will provide a
value something like this crypto Alpha
star so what the star represent is the
star refers the suffix part the suffix
part can be a dead part or it can be
something else as well so for our
scenario the suffix is dead part
so we will be passing this expression to
the wildcard file name so obviously what
it will do it will pick up what are the
file which is having the file name
prefix followed by the date part or
whatever it is but or for our scenario
the date part is going to be the suffix
now let's come back to object data
Factory
so this concat expression will for sure
it will pick up this kind of files which
is having this suffix part as well now
let's come back here and click on OK
we have given the path name followed by
the file name as well as a wildcard
and now it's time to move to sync under
sync from the drop down select the
destination or data set which is our SQL
data set just selected here if you see
we have parameterized this file name
part right in the table data set so this
came up here so I'm going to add this as
a dynamic content and just click on this
for each current item and now what we
need is this particular table name
value so copy it and paste it
so whatever the value comes here it will
be passed to this parameter to the
particular data set now
and one more thing if you go to mapping
please do not import any schema because
for each of the file our schema is going
to be different right the tables are
different so don't import it just click
on publish
let me come out of the loop
and let me debug in order to test out
the flow
it got completed and if you Mouse over
on copy activity it is showing total Run
3 3 times it run and down below also you
are able to see this right three times
the copy activity run for each of the uh
tables it ran once and if you click on
the output here you are able to see how
many rows got copied
and that's it you can cross your refine
table as well whatever the SQL scripts
and the Excel file which we have used
for this video I have uploaded to this
community just join here for free with
your email ID
and under Library you will able to
access this resource to practice
under that I have uploaded this
particular zip file which will have the
SQL scripts Excel file as well as the
arm template of this particular video
and this is the video number just
download it here as of now this is
currently free I'll be providing the
link to join this community in the video
description please do join
thank you for watching this video please
hit subscribe and follow me on LinkedIn
to stay connected
Browse More Related Video
5.0 / 5 (0 votes)