23.Copy data from multiple files into multiple tables | mapping table SQL | bulk

Cloud Tech Ram

8 Mar 202317:41

Summary

TLDRThis tutorial video guides viewers on copying data from multiple files into various SQL tables using Azure Data Factory. It addresses limitations such as the inability to manually map or use the upsert option. The video demonstrates creating datasets for source files and destination tables, parameterizing table names, and utilizing a lookup activity to read from a file mapping master table. It also covers setting up a for-each loop for dynamic file processing and concludes with a debug test to ensure successful data transfer, encouraging viewers to access provided resources for further practice.

Takeaways

📋 The video demonstrates how to copy data from multiple files into multiple SQL tables using Azure Data Factory.
📂 There is a limitation in manual mapping, as the same copy activity is used for different schemas.
🔄 Ensure that the source file has the same schema as the destination table to avoid issues.
🛑 The 'upsert' option cannot be used in this copy activity scenario.
📅 Files and directories in Data Lake are read dynamically, with the folder structure being organized by Alpha, Beta, and Gamma schemas.
🚩 An 'active' flag in the mapping table allows control over which tables are processed on specific days.
🛠 A single data set for dynamic files and parameterized SQL table names is created, allowing dynamic file and table mapping.
🔄 A 'for each' loop is used to iterate over different items in the value array, dynamically copying data based on the source and destination mapping.
🔗 The source and destination datasets are parameterized to handle dynamic table names and file paths for scalability.
✅ The copy activity runs for each table (Alpha, Beta, and Gamma) and copies the respective rows into the SQL tables, confirming successful data transfer.

Q & A

What is the main goal of the video?
-The main goal of the video is to demonstrate how to copy data from multiple source files into multiple SQL tables using Azure Data Factory, specifically focusing on handling different schemas for the destination tables.
What limitation is highlighted at the beginning of the video?
-The presenter highlights a limitation that manual mapping cannot be done when copying data from different schemas using the same copy activity. Additionally, an additional column cannot be added within the copy activity for this process.
How does the presenter suggest handling scenarios where data should not be copied into a particular table?
-The presenter suggests using an 'active' flag in the mapping table. If the active flag is set to 0, the corresponding table will not be processed, allowing flexibility for excluding tables on certain days without having to delete them.
What is the structure of the source files and where are they stored?
-The source files are stored in a Data Lake in a structured folder system. Each table has its own folder, and under each folder, the corresponding source files are located. There is a daily folder structure, and the file paths are mapped dynamically.
What kind of data set does the presenter create for the destination SQL tables?
-The presenter creates a single data set for all the destination tables (Alpha, Beta, and Gamma) and parameterizes the table name so that it can dynamically change during execution.
Why is a wildcard option used for the source files in the copy activity?
-A wildcard option is used because the file names have dynamic suffixes, such as dates. The wildcard allows the system to pick up files with a specific prefix and any variable suffix.
How is the dynamic content handled for the table names in the SQL destination?
-The table name in the SQL destination is parameterized and passed dynamically using the current item in the for-each loop. The script extracts the specific table name from the lookup activity and uses it during the copy process.
Why does the presenter advise against importing schemas for the destination tables?
-The presenter advises against importing schemas because the destination tables have different schemas, and importing a fixed schema could cause conflicts when copying data to different tables.
What task is used to read the file mapping master table, and how is it configured?
-A 'lookup' task is used to read the file mapping master table. The task is configured to return all rows with the 'active' flag set to 1, ensuring that only the relevant data is processed.
How does the presenter test the process after setting up the pipeline?
-The presenter tests the pipeline using the 'debug' feature in Azure Data Factory. They check the output to verify that the copy activity runs three times (once for each table) and confirm the number of rows copied for each table.