[0:00]Hi friends, welcome to Azure Data Factory video series. This is part two. In this video, we are going to discuss about few top-level concepts in Azure Data Factory, which you should know, okay? So the main agenda is to discuss about the concepts. So there are few top-level concepts in Azure data factory, okay? I can say the components in the Azure data factory, okay? Which you should know actually because, whenever you create any Azure data factory, you will make use of these components to work together, so that you can get the whatever the requirement you want, right? You can get the requirement to be done, okay? So, these are the main things: pipeline, activity, data sets, linked services, and triggers, okay? So what pipelines are and what activities are, what data sets, linked services, and triggers, you should have at least a slight idea, okay, before you start. So, hence I choose this topic as a second video. So let's move on. So what is pipeline, okay? So, first thing is in Azure Data Factory, you can have one or more pipelines, okay? A pipeline is actually nothing but, it's a logical grouping of activities that performs a unit of work, okay? So, if you want to take an example, what I can say is, let's assume, inside your data factory, you want to perform any type of activities, right, any type of task you want to perform as a sequential orders. So, you want to group them, you need to group them, and that group we will call it as a pipeline in Azure Data Factories, okay? So, pipelines are the main components in Azure Data Factory, which which is the people's ultimate goal, okay, so to create a pipeline and to run the pipeline, so that that pipeline can perform all the activities, whatever I defined inside them, okay? So, for example, if you take you want to create a, your requirement is, to get, to ingest data from Azure blob storage, okay, and then on top of that particular data, you want to run a Hive query on an HDInsight cluster to partition the data, okay? This is a simple requirement you have. So, what you have to do is, you have to use a one activity, which injects data from blob storage, and then you have to use another activity, which runs the Hive query on the HDInsight cluster, okay? So, don't worry about this Hive query, and HDInsight and blob storage, everything we are going to see in practical in our upcoming videos, okay? So, let me quickly show you, let me open my Azure portal.
[2:54]Okay, once you open, okay, once you open, you will see something like that. You just let's, let's create a new Data Factory, okay? So, let's click plus. To create any data factory, just search like Data Factory term here. And select that. Click on create. It will ask you to select enter few details. Let's say, Data Factory demo Mahir. Okay, so version is fine. Three trial, this is my subscription. Then resource group, you need to select. Okay, I have already this resource group created text environment, let me select that. This is to enable your data factory with Git repository, for now let it be we no need to do that. Let's click on the create button. So you can see the deployment is started. It will take few, I think one or two minutes. Once, after two minutes it will get done, okay? So, I think it is running the deployment, right? Okay, great. Deployment is completed. You can click on this go to resource to go directly there, okay? So, once you open your Data Factory, this is how it looks like. It is the Data Factory name, and the overview page, okay? So you need to click this Author and Monitor tile to open actually Data Factory and see the components. Let's click on there. So it will open this. So, actually inside the Author and Monitor mode, you actually do your development, okay? Okay. So it opened. So once you open an author and monitor tile, right, uh, so the, you can see the home button here. So this page actually gives a quick start wizards to create a pipeline or data flow or pipeline, copy data. And also it it will give the documentation related stuff and few videos on Data Factory which are created by Microsoft guys, okay? So, you can see the tutorials and everything you can navigate there. So let's go to this monitor author tab. I mean, this is the portion where you actually perform your edits and developments, okay? So, once you open, you can see here there are few components: pipelines, data sets, data flows, connections, triggers, right? So, this is what I am talking about the pipelines, okay? So, you need to create a pipeline actually to group the activities which you want to perform. Let's quickly click this new pipeline. Okay, so this is pipeline one. So, what you have to do as I said, if we see this particular example, I said you want to perform a copy activity to inject the data into blob storage, and then you want to write run one Hive query, right? So, how you will be doing is, there is something called copy data, what this will do is this will copy data from one storage to another storage actually. You need to configure these options. We are not going them in detail. So, and then maybe once you copy the data, you need to run this Hive query, you want to run this Hive query on top of that data to perform some data transformation. So, after that, maybe if you want to run some Azure function, let's assume. So, then you need to drag and drop that Azure function here activity and configure the options for what Azure function it is and everything, so it will run that Azure function. So, like this, you can, you can perform a complete unit of work by using individual activities and by mapping them in a sequential order. So, that is what the pipeline is all about, okay? So let's go to presentation. So, the second thing is what is activity. As by this time you might have got the idea what activities. Activity is nothing but, it is a simple or individual processing step inside a pipeline, okay? So, for example, as I said copy data, this is one activity, which moves data from one source to another source, and Hive data, this is Hive script. This is another activity. So, Azure function is another activity. So, here we are mapping Azure function to run inside our pipeline, okay? So, individual activities, okay, activity is a individual task which you want to run inside your pipeline, okay? So, let's move on. Then, the next thing is what is linked services and what is data sets. If you see here under connections, you will found something like linked services, right? Okay, when you click this plus button, actually you will be creating a linked service for any type of resource here. So, linked service is actually, it's nothing but a connection string for your Azure storage for your resources, it might be Azure storage, it might be Azure function, so it might be any type of source, okay? So, any type of resource if you want to access, you need to have a connection string, right?
[8:37]So, that connection string representation is nothing but the linked service, okay? So, that is what the linked service is, okay? So, why you need these linked services is actually, so that your Data Factory knows holds that particular connection string, and if you use any activity inside your pipeline, which requires to interact with that particular resource, external resource, then what ADF will do, it will take that particular connection string from linked service and it will interact with the external resource to perform the activity, okay?
[10:19]So, then what is the dataset actually, okay? So dataset is actually, it is a representation of data structure within a data stores, okay? In simple terms, what I can say is, it is just a reference point for the data what you want to use inside your activities, okay? For example, if I want to take an example, what I can say, let me close this. If you go to the our pipeline, right, here. So, this copy activity, right, okay? So, this copy activity, it will copy data from one particular resource to another particular source, right? So, from source to destination. So, let's assume you want to copy data from Azure storage, inside the Azure storage, especially you want to copy data from a container, one of the blob container, and paste that particular data into another Azure storage, another blob storage related container, okay? So, so the linked service here, in this case, represents, uh, the connection string for the Azure storage. And the data set in this particular case represents, uh, the information or the related to the blob storage containers or folders, okay? So, data sets actually represents the exact locations, you can say. So, where you want to, where you want to point to the data, either you want to read it or you or you want to write it, that is not the matter, exact location where you want the data the structure representation. And linked service is actually the connection string for the storage, that's it. So, you need to get this difference properly between the linked services and data sets. So, as a as I said, linked services connection string for storage, data set is actually the reference point to the location where you want to perform the data read or data write activity, okay? You can think like this. Don't worry if you are not getting these details properly in this video. We are going to do all this in a practical, at that moment you will get all the idea, okay? So, next thing is the triggers. So, why the need of trigger and what the trigger inside the data factory is. Once you design your pipeline, you want to schedule it. For example, you want to schedule this particular pipeline. So, everyday this at this particular time I want to run this particular pipeline. So, that is what your requirement, right? So, in that case, you need to create such trigger. So, that trigger will run every day this pipeline at 9:00 AM. So, trigger actually determines when your pipeline to run, okay, when your pipeline to execute. So, that is what all about the trigger is to define when my pipeline should run, okay? There are different types of triggers available. We'll discuss all of these in great detail in our upcoming videos, okay? Hope you got the high-level idea of all the concepts, which are very important for ADF. Thank you, friends. Please subscribe to my channel and press bell icon to get notifications whenever I add videos. Thank you.



