Thumbnail for 1. Introduction to Azure Data Factory by WafaStudies

1. Introduction to Azure Data Factory

WafaStudies

8m 27s1,205 words~7 min read
YouTube auto captions
Transcript source

YouTube auto captions

This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Timestamped outline
Pull quotes
[0:00]In this video series, we are going to cover all the important concepts and try to do the practical labs also on Azure data factory.
[0:00]In this video, we are going to see about the introduction to Azure data factory.
[0:00]So the main agenda is to know about what is data factory and why this data factory is required and to discuss about an example to get an idea about data factory, okay.
[0:00]For scale out serverless data integration and data transformation, you can also lift and shift your existing SSIS packages to Azure data factory and run them with full compatibility in ADF, okay.
Use this transcript
Related transcript hubs

[0:00]Hi friends, welcome to Azure Data Factory video series. In this video series, we are going to cover all the important concepts and try to do the practical labs also on Azure data factory. This is part one. In this video, we are going to see about the introduction to Azure data factory. So the main agenda is to know about what is data factory and why this data factory is required and to discuss about an example to get an idea about data factory, okay. So what is Azure Data Factory? This is one of the Azure's cloud service, actually saying ETL service. For scale out serverless data integration and data transformation, you can also lift and shift your existing SSIS packages to Azure data factory and run them with full compatibility in ADF, okay. So ADF stands for Azure data factory, okay. So it is the cloud-based ETL and data integration service that allows you to create data driven workflows for orchestrating data movement and transforming data at scale, okay. So don't worry if these definitions are looking very complicated to understand. Now I'm going to explain you with an example. So that you will get a clear idea, okay. So Azure Data Factory, it's an Azure service, okay. So as we said it is like ETL cloud service, ETL cloud service, right? So ETL means you, uh, the business intelligence guys, BI guys knows this. E means extraction. T means transformation. L means load. So in ETL pipelines what people will do is actually, they will extract data from different types of data sources. It might be text files or excel sheets or SQL databases, Oracle. So they will extract the data and then they will write some logic to change the data. I mean the to change the look of the data, they will perform some transformations using some scripts or whatever maybe it is, okay. So once the data get transformed into some meaningful way, they will load the data once again into some other data source, okay. So read from source, do some transformation and load data into some destination. So this is called ETL pipeline actually, okay. So using SSIS packages, right? SQL server integration services, people used to do this ETL pipelines. And once ETL pipelines completes, on top of that uh destination data, people used to uh, use any uh reporting tools like Power BI, okay. Power BI kind of tools people used to use and generate some reports, okay. So this is what happening in previously. Now, this ETL activity can be done using cloud service called Data Factory, okay. So let's take one example actually to understand this, okay. Let's say if you are a owner of a game, very big game, okay, that is very popular across the globe, okay. And this game, whenever people used to play game, right? It generates so many logs and these logs you are storing on Azure data lake, okay. So Azure data lake storage is actually to store the raw data, it is for big data, uh, mostly people will use. So what big data is, big data is nothing but the huge amount of data, huge amount of unstructured data. When I say unstructured data, it's not like in a tabular format or something, it's like a raw data in text files or CSV files or excel sheets or what So it's like a files, raw data, okay. So the logs you are storing in Azure data lake storage, and also you have the customer's information and marketing campaigns related information, everything in your on premises storage, okay. So you have, for this particular game, you have logs on the cloud, on Azure data lake storage, that is unstructured data. And you have customer data in your on-premise data. So now what you want to do is, you want to refer this customer data and you want to analyze these logs, right? On Azure data lake storage, you want to analyze these logs. You want to read these logs and transform that logs into some meaningful way.

[4:36]And once the transformed data, you want to store it in Azure SQL storage or somewhere, okay. So that on top of that particular storage, on top of that particular storage, you want to use Power BI kind of tools to generate some reports. meaningful reports, like I can monitor my trends, like which futures people are using more, okay? Which mode people are using more? Which ads people are clicking more? So I want to see all these trends on the Power BI reports finally, so that I should take some decision, okay. What kind of things I should provide for my users to attract more, like what? So I can take the business decisions, right? So I need this business insights to take the business decisions. So those insights finally we want to achieve, right? So here if you observe, in this step, these two steps helps you to extract data, right? To extract the data, then you write some scripts to transform data and then you load that data into some Azure SQL DB or somewhere, okay. And on top of DB, whatever the BI tool you will use, that is fine. So these ETL, this entire activity, whatever I discussed in this example, you can achieve very easily with Azure Data Factory, okay. So that is why, if you see the definition now, it will help you, it is a data integration service, ETL data integration service.

[6:21]And that will help you to create a data driven workflows and orchestrating the data movement. So data is moving from these sources into SQL, right?

[6:36]So there is a data movement and we are integrating these sources, uh these data lake storage and on-prem data and also we are moving data into SQL DB. So it is a data integration and data movement tool, okay. And it will help you to create this entire process, right? They will call it like pipelines in ADF. So we need this is one time activity, you just need to create the pipeline. And then you can schedule it. It will run every day, okay. So that is the reason we are saying it is data integration and data transformation tool which will help you to create the workflow actually, this is a workflow, right? Okay. So hope this gives a idea about what data factory on a high level, okay. So let's move on. So we discussed about why data factory, right? So this is what. So big data requires services that can orchestrate and operationalize the processes to refine this enormous stores of raw data into actionable business insights. Azure Data Factory is such managed cloud service actually, that will build for creating extract transform load, ETL, extract load transform ELT and data integration projects, okay. Hope this gives idea, please stay tuned. I will be creating all the videos on ADF with all the concepts and with all the practical labs, okay. Thank you guys. Please subscribe to my channel and please bell icon to get notification whenever I add videos. Next, create your first Azure function.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript