Thumbnail for AZURE DATA ENGINEER INTERVIEW Secrets You Need to Get Hired | Mock Interview | EP - 07 by Azurelib Academy

AZURE DATA ENGINEER INTERVIEW Secrets You Need to Get Hired | Mock Interview | EP - 07

Azurelib Academy

24m 16s4,621 words~24 min read
YouTube auto captions
Transcript source

YouTube auto captions

This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Timestamped outline
Pull quotes
[0:01]What are the steps to be taken when you develop this on-prem to the cloud pipelines?
[0:01]Yeah, so, uh, we can transfer the data from on-prem to the cloud, we can create a copy activity where we are copying the data from on-prem to the cloud.
[0:01]Now, metadata activity will actually give not the entire value, but the column's names, how the things are inside the file.
[0:01]Hello everyone, we have Devshree with us, and we're going to do a mock interview.
Use this transcript
Related transcript hubs

[0:01]Why you have an on-prem database and you have a Azure SQL database both? So, from on-prem, we are taking the data from an Oracle database. What are the steps to be taken when you develop this on-prem to the cloud pipelines? Yeah, so, uh, we can transfer the data from on-prem to the cloud, we can create a copy activity where we are copying the data from on-prem to the cloud. How the meta activity would be useful here? Why somebody will use a metadata activity and how that will be useful here? Now, metadata activity will actually give not the entire value, but the column's names, how the things are inside the file. Hello everyone, we have Devshree with us, and we're going to do a mock interview. So if you're watching this series, please go and subscribe, and remember that watch till the end because in the first section, we're going to take an interview, and towards the end, we're going to have an entire feedback of what she has done wrong, what she has done right, and what could have been much better. Okay, so let's start with the mock interview. Okay, welcome, Devshree. So, please start with tell me something about yourself. Uh, so, hi, I'm Devshree. I'm working as a data engineer with three years of experience in building scalable data pipelines using the Azure services. And currently my company, I've like worked on end-to-end ETL development using the Azure Data Factory, Databricks, and ADLS to improve the pricing analytics. And one key project involved, uh, building a PySpark-based pricing engine that integrated the multi-source data and, uh, reduced the manual processing by 70%. So yeah, and enjoy transforming raw data into actionable insights. Okay, so I'm trying to scrutiny. Okay, you said it is going to improve by 80%, that's what you said, right? Uh, reduced the manual processing, like loading the data to the ADLS and all that. By some percentage you said it, right? Yes, by 70%. 70%. How you calculate that? So, uh, usually earlier it worked like like the users used to upload the data like we have different pricing according to the products what they are, and they used to manually load the data to the ADLS, but now it's already automated.

[2:15]So, once they load the file, which have different products, but, uh, the, it works like that, it will check which type it is, and according to that, it will load into that specific folder into different ADLS location. So that's how, so the manual processing have been reduced now, like where I'm loading the data, uh, from the one section to the ADLS section according to the types. Understood. And what is the technology involved in this project? Uh, we have used the Azure data services, like where I'm using the Azure Data Factory as an orchestration tool, Databricks for the transformation, and ADLS to storing the data. What kind of transformation you are doing in? I mean, it cannot be done in the ADF, why you need the Databricks? Uh, so in like in the ADF, we might have to use the data flow, which is one of an expensive job. So that is why we are using the Azure Databricks, and we also have a more complex, uh, pricing engine where we are using that. So, in the Azure Databricks, we are using the different PySpark functionalities where we are using like we also have the user-defined functions also there to calculate the pricing also.

[3:34]So that is why we are using the Databricks. Okay, great. Now, how many pipelines you have in your project? Uh, we have more than 50 pipelines. Okay, how many pipelines you have made? 10. Okay. Can you explain me any one of them which you have? Yeah, so in one, I'm just copying the data, like we have multi-source data from where I'm copying the data from one of the on-prem DB to the ADLS location.

[4:06]So, I have that one pipeline, and another pipeline I'm storing the data from Azure SQL database to the ADLS location. That is also one pipeline, and there is another pipeline where I'm integrating the Databricks notebook in that pipeline for the transformation, and then storing the data after the transformation to the ADLS location. One pipeline that makes sense to me is on-prem to the cloud, okay. Second, you said it from Azure SQL DB to the cloud. to ADLS, to ADLS, yeah. Why you have an on-prem database and you have a Azure SQL database both? So, from on-prem, we are taking the data from an Oracle database, and that is from a different application, where they are putting the values in their Oracle database. That is why we are taking the data from Oracle database that is from different application whose managing it. So, we are taking their data through the on-prem and storing it into ADLS for the business analytics team to work on it. Okay, and in the cloud database, what you are storing? Uh, that is from another application where we are getting the data. It's the different prices of the different product codes, and from there we are taking the data into the ADLS where the other business team who are working on the pricing engine, where they are selecting working on the pricing of it, they are taking the data from there.

[5:33]Understood, you have developed a pipeline, you developed an on-prem to the cloud pipeline. What are the steps to be taken when you develop this on-prem to the cloud pipelines? Yeah, so, uh, we can transfer the data from on-prem to the cloud. We can create a copy activity where we are copying the data from on-prem to the cloud. So yeah, that would be the one. Can you tell me in deep what else can be done? So, when we are moving data from on-prem to cloud, we actually, uh, need to create our self-hosted integration, uh, that is the IR. So, uh, so we are going to create that self-hosted integration to copy the data from on-prem to the cloud. And, uh, and sorry. Imagine if we don't create a self-hosted, is there any way I can copy without using the self-hosted IR? Is there any way? Uh, no, because like on-prem would work as a private endpoint, right? So, to connect to that private endpoint, we need a connection, so that is why we have self-hosted IR. Okay, now on the second side, imagine that I have a file system, like I have a one folder in my server, okay, and that is on-prem. I have to pull the data from that file system, okay, like a folder. Imagine that you have one folder in your Windows or Linux machine on your on-prem, and from there you have to copy the data and push it into the cloud. Will it be possible, if yes, through ADF, say yes, if not, say no, and if it is yes, then how? Uh, yes, we can do that, like we need to install the integration runtime on our system. And after installing that, uh, we can make a, we can, we also need to create one another linked service on the ADF section, uh, which will be, uh, where we have like integration runtime will be the self-hosted one where we are linking that. So we can make a connection between our self-hosted, uh, sorry, uh, between our integration runtime which we have installed in our system, and then, uh, on the ADF that way as well, and from there we can, uh, after we are able to. You created two self-hosted, I mean, two integration runtime, you're saying, need to create or not? No, no, no, one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine. So, I install that software on my machine to make a connection, so that will be like an on-prem, yeah. And then, uh, for the ADF part also, I need a connection which will actually pick the data from there. So that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, no, I mean, I meant that in ADF to connect to this, uh, this on-prem, I need a linked service, like a connection. So, no integration runtime into the ADF. No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that they are two integration runtime needed to be done. Is it really? No, only one integration that we have, that we need to install on our own, suppose it's on my machine, so I install that software on my machine to make a connection. So that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really?

[9:40]No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm. But you said like integration runtime, one integration runtime into the ADF as well, you said. What is it, what do you mean by that? No, that integration, in the linked service itself, it will have a, see, sometimes we have like auto-resolve integration runtime when we are creating any kind of a linked service. But when we come to the on-prem, we, uh. But I assume that there are two integration runtime needed to be done. Is it really? No, only one integration tool that we have, that we need to install on our on-prem, suppose it's on my machine, so I install that software on my machine to make a connection, so that will be like an on-prem, yeah, and then for the ADF part also, I need a connection which will actually pick the data from there, so that is why I need a linked services over there where I'm.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript