Thumbnail for How to Scrape Websites Using OpenAI and make.com (No Code!) by Yashica Jain

How to Scrape Websites Using OpenAI and make.com (No Code!)

Yashica Jain

26m 57s4,482 words~23 min read
YouTube auto captions
Transcript source

YouTube auto captions

This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Timestamped outline
Pull quotes
[0:00]Just imagine that you are living in 2018 and you've got a list of websites, okay?
[0:00]And then your job is to extract some data from those websites and put it into a Google sheet.
[0:13]And if you have hundreds of website on your sheet, then it would take you hours and hours of work.
[0:13]And obviously, if you're not a programmer, you cannot code it out to do it for you.
Use this transcript
Related transcript hubs

[0:00]Just imagine that you are living in 2018 and you've got a list of websites, okay? And then your job is to extract some data from those websites and put it into a Google sheet.

[0:13]And if you have hundreds of website on your sheet, then it would take you hours and hours of work. And obviously, if you're not a programmer, you cannot code it out to do it for you.

[0:21]But thankfully, you're living in 2024 and now all of this process can be automated in less than a few minutes.

[0:29]You spend 30 minutes designing this simple automation and this automation will scrape all of the details for you.

[0:37]It is going to use the power of AI to add in any descriptions or any other field that you want.

[0:44]Apart from that, it does not matter how many websites you have, it could be a list of thousands of websites and this simple automation is going to take care of all of it.

[0:53]And in this video, I'm going to show you how you can set up this simple automation so that you can sleep while this automation scrapes the websites for you.

[1:02]You just need to set up this simple logic and you can tweak what all fields you want to scrape later on.

[1:08]So, if you're ready, let's just open a new tab in your computer and let's keep going. So hello everybody, my name is Yashika Jain.

[1:15]I'm the founder of Automate AI. At Automate AI, we build automated systems for businesses that use the power of AI.

[1:21]These systems are generally built out so that all of the manual tasks that human beings are spending their precious time on can be automated so that human beings can either sleep, eat, travel or do some other work that is of importance.

[1:38]Well, anyways, I guess I talk a lot, so let's get started. So a quick disclaimer, you don't have to write a single line of code, okay?

[1:45]You could be a complete beginner, just follow this video and you will have this powerful automation running then and there. So, let's get started.

[1:51]So for the people who are new to this, I must tell you that the app or the platform that we're going to use in this video is going to be make.com.

[2:00]So make.com is a no code automation platform that allows you to build all of these cool automations to automate your manual tasks.

[2:08]And if you don't have an account with them, consider using the link in the description below to create an account as you're going to get some extra perks for using my link.

[2:17]If you already do, then just log in. After you log in, you're going to land on a dashboard something like this and all you need to do is click on this purple button to create your very first automation.

[2:28]Well, before we start building our very first automation, I am going to show you how this um system actually works, so that um, you know, you're just as excited to build it and you stick to the very end.

[2:41]Okay? So as you can see, this is the Google sheet and all of the fields are empty right now. And now with magic, all of them will be filtered out.

[2:51]Okay? So I'm just going to take five or six records for the time being, so these are six records um so that, you know, we save time.

[2:59]And um this, these six records are going to go through the automation. All of these fields would be populated. Okay? So yeah, let's go ahead and run the automation.

[3:10]The automation is running. If you stick to the very end, I'm also going to show you how you can um schedule this automation so that you don't have to come into this platform every time and turn it on.

[3:25]You're, you're sleeping and this automation would do its work. Okay? So yeah, I'm going to talk about that.

[3:31]Well, I'm just going to pause the video for now and once all the six records are done, I'm just going to put it on.

[3:37]So we're already at two records, maybe we're just going to come here until then to see if it's working all good or not. So this was the link that I had. You can also go to their website if you want.

[3:47]And the business name is this. This is the description that Open AI came up with. Oh, we already have the third record as well. Well, anyways, um this is the description, okay?

[3:57]After that, this is the contact information, so you can contact them at this email, then these are their social media links and then this is a field for USP.

[4:09]Now, USP is unique selling um point, I guess. Yeah, that's the correct full form, I think. Yeah. So a unique selling point is something that is unique to that particular business and that's the reason why that particular business is running. Okay?

[4:22]So for this particular company, this is their unique selling point, okay? So they kind of donate 5% to nonprofit, so that is one of their unique selling point.

[4:32]Anyways, next up is ICP, so ICP is ideal client profile. So we are looking at who are the people that this particular business sells to.

[4:41]So this business sells to coffee enthusiasts and businesses seeking premium, freshly roasted coffee basically. Okay? And then once the record is processed, it, it changes the status to yes from no so that we don't take that in the next run.

[4:57]And similarly, it is doing that with all of the other links as well. Now, as I said, you are free to change all of the fields.

[5:04]If you want something else in place of description, you can put it. If you want something else in place of USP, you can put it. It's AI, right?

[5:14]You just give it instructions and it's going to do that for you. So I hope by now you're convinced that this is magic and this is something very, very powerful that literally everybody needs.

[5:24]Um, if that's the case, then let's get started and I would request you to stick till the very end.

[5:29]Okay, so this was the automation that I had already set up, but again I'm going to build it from scratch so that you can follow along.

[5:37]But before that, let's just quickly go through it and see what it's really doing. So, whenever the automation triggers, it looks for records in our specified Google sheet.

[5:48]And it checks for the records where the status is no. Okay?

[5:54]So it's going to look for the records where the status is no. If it says no, then it's going to pick up the record. If it says yes, then it's not going to pick up the record.

[6:03]After that, we basically um kind of scrape the content of the website. Okay?

[6:09]We do some parsing here, don't worry about it, I'll teach you. And then the scraped content goes to um an Open AI's chat completion model where Open AI analyzes the entire website's content and then gives you all of this information that you can see on screen.

[6:26]Okay? And then it's updated back into the Google sheet. So, let's get started.

[6:31]So I hope your screen looks something like this and the very first module that you would want to keep in your automation is going to be the search rows by Google sheet um module.

[6:43]Okay? Then it's going to ask you to connect your Google Sheets account with make.com so that it can access the spreadsheet.

[6:50]So kindly do that. After you've connected your Google Sheets account, you're going to land on an interface something like this.

[6:59]The first thing you would want is to create a Google sheet, right? I already have it, but you can just create a brand new Google sheet, just name it something and then select that particular Google sheet right here. Okay?

[7:12]Then select the sheet name, in our case, it's going to be sheet one. Then, uh, we're going to put filters, okay?

[7:19]So if you remember, I said that it only takes the records where the status is no, right? Or it's not yes, okay? So it could be no or empty.

[7:31]So we'll just going to put that filter here. So we're going to say if this particular field equals to no, then pick it up. Or if this particular field does not equal to yes, then pick it up.

[7:50]Okay? So I hope this makes sense to you. So if the field is equal to no, then pick it up. Or if the field is not equal to yes, right? It could be empty, then pick it up.

[8:00]Yeah. And the rest of the settings remain the same. We don't have to play with that. We're going to click on okay and save.

[8:09]So I would request you to just have a place for links in your Google sheet and another place for this one.

[8:20]That says processed. You can also change the name to status if you want, but it's up to you. Okay? Then it's your call if you want to keep all of these fields or not.

[8:29]Um, but yeah, if you want to follow along, then just have these fields. I'm just going to quickly name them for you. So, the first one is business name, then we have description, then contact information, then social media links, then USP, and finally ICP. Okay?

[8:44]So yeah. And then what we're going to do is um, since I'm building it again with you, I'm just going to delete all of this data.

[8:52]Okay, so here, delete. And I will change the status of processed to no for the top three fields, okay?

[9:06]Because while building, we will have to test a couple of things, right? So for that, I'm just going to select the top three fields. Okay?

[9:13]So I hope you have the very first module ready. What you're going to do is you're going to double click on it and click run this module once.

[9:21]And let's see if it picked up those three records where I had set the status as no. So okay, this is the first one, this is the second and this is the third one.

[9:33]Yes, it works pretty fine. Also, don't forget to name your automation. So we could say something like website scraping, YouTube, okay?

[9:44]Yeah. And then you can see this small button that says save. So just click on that so that you're saving your work as you go.

[9:52]The next step in our automation is going to be scraping the website content, okay? And um, you might, in some other YouTube videos you might see um that they use other platforms or some some um code to do that.

[10:07]But I just have a very simple method of doing it. What I generally do is I I search for HTTP here, okay? And then under the HTTP module, I select the make a request module.

[10:20]Okay? Then I simply map the URL of the website here. Right here. This remains the same and for the body type, I select raw and then um HTML, this one, if you scroll, you're going to find HTML, and then toggle this as on.

[10:40]Now, what this would do is this is going to make a request to that particular website's URL, and it's going to retrieve the HTML code of the website.

[10:49]If you're not a programmer, if you don't understand any of it, no worries, just know that we are just fetching the um text content of the website basically.

[11:00]Yeah, and don't forget to turn this as yes, click on okay and save. Okay, so let's test and make sure our module is working all fine.

[11:08]So I'm just going to copy one of the websites. I'm going to right click and then run this go to run this module only once.

[11:18]I'm going to paste the website and click on okay. So we've got the output, click on the one bubble to see it. Then under the output section, if you go to data, you will be able to see the HTML data of the website.

[11:30]Now, I know this makes no sense to human I, but don't worry. AI will be able to interpret it. Okay?

[11:37]So if you noticed, currently our um result is in HTML format, so it's HTML code, but Open AI or your AI module basically would need text.

[11:49]So for that, we're just going to convert it to text. So search for text parser here and then you're going to see a module that says HTML to text.

[12:00]And then simply map the data here. Now, what this is going to do is it's going to take the HTML, convert it to text so that our next AI module, module can access it, okay?

[12:15]So just do this and save. So by now, we have scraped the website. We have created the website's text content into text from HTML.

[12:21]And now comes the most exciting part, that is selling up the Open AI module to kind of analyze this and get you all of these different fields out of it, okay?

[12:33]So, let's go plus, let's go Open AI and create a completion. So on your screen, it's going to ask you to create a connection.

[12:43]So just make sure to create Open AI's API keys and add it into make.com so that it can access your account.

[12:50]If you don't know how to do it, I have another tutorial on how you can connect it. So I will just link it in the I button here. Just go there, um create an API key and then connect it to make.com and you will um see an interface something like this.

[13:07]Before starting, please change the maximum completion tokens to nothing. Basically, just keep it empty. Um, that's what I meant to say.

[13:16]And then for the model, you could select GPT4O, okay? Because I think it's good at extracting data from given content.

[13:28]But if you want to save on costs, then you could also go with GPT4O mini. It's up to you. I'm just going to go with GPT4O.

[13:34]Now, it's time for us to give it the prompts. So, um, just as we give prompts to chat GPT, right, we just going to do the same thing, but right here.

[13:43]So we're going to go here. We're going to select the first role as system. Again, if you don't know the difference between the three, I have again covered it in that video, but I'm still going to give you a quick summary.

[13:57]So, the system instructions are kind of the underlying wires or you can say the underlying context on which the AI module operates on, okay?

[14:10]So you tell it what it's supposed to do, how it's supposed to do, how it is supposed to answer and then in the user prompt, you can basically just give the variable information.

[14:19]Like in our case, we could pass in the website's content in the user section. But in the system section, we can just give it all of the context. Okay?

[14:30]So yeah. Since I already have the prompt with me, I'm just going to copy paste it from here. Yeah. So this is the prompt.

[14:40]I will make sure to share it with you guys. So look for it in the description below. You will find all the resources right there.

[14:47]So, the prompt says that you are a highly advanced language model. Analyze the provided raw HTML or scraped text from a website.

[14:55]Extract and organize key business information into a flat JSON format. Avoid nested structures. By the way, if you don't know what a JSON format is, don't worry, it's not that crucial.

[15:09]You just follow along and you will understand it just in a moment. Okay.

[15:16]After that, we're telling it the required output fields. So we're telling it that you need to give us the business business name. We're telling it what it really means.

[15:25]Then the description, we're telling it what it means, then the contact information, again, what to include, then the social media links, then the USP, then the ICP and so on and so forth.

[15:40]Okay? So, if you remember in the beginning of the video, I said that you can change all of these parameters. If you want to have something, have it. If you don't want to have something, delete it.

[15:48]So yeah, this is the very place where you do that. If you want to keep USP, keep it. If you don't want it, just delete it. If you want to have something else, add it here. Okay?

[15:58]Then we finally give it instructions as to how it should output its output basically. And we're telling it to give it a JSON structure, so this is how the JSON looks like.

[16:11]So it's going to first give the business name, it's going to put it here, then it's going to give the description, it's going to put it here, then the contact information here, so on and so forth. So this is what JSON is. Okay? So I hope you understand it now.

[16:25]Then we are also giving it an example, okay? So that it it knows that how it is supposed to and it does not hallucinate.

[16:36]So, I've just given it a very random example, so I just took a business. I gave a description, contact information, all of that, okay?

[16:48]And yeah. Then finally, at the end, we need to give the scraped data. And if you remember, the scraped data is this one.

[16:58]So HTML to text, this is going to be the scraped data. After that, please click on this toggle button and change the response format to JSON, right? And click on yes here.

[17:11]So basically, we're getting the JSON from Open AI, so we're just going to put it as JSON here. We would also like to change the temperature. So I'm just going to keep it to 0.5, okay?

[17:21]You can read the description here about what temperature means. I've also covered it in my other tutorial, if you're interested.

[17:29]We're not going to change any of the settings and we're simply going to click on okay and save. You have done most of the work. Trust me.

[17:38]By the way, you're going to find the prompt in the description, so don't worry about that. Just use that, make um make changes to it and start using it.

[17:46]Lastly, the very last job is to simply update the Google sheet with our updated results. So, we're going to come here. We're going to select enter manually. We're going to map the spreadsheet ID from the very first module.

[18:01]We're going to map the sheet name from the first module. We're going to map the row number from the first module.

[18:10]Okay. And then we're going to map all of the fields. So business name, description, contact information, social media links, USP, ICP, and then finally processed as yes.

[18:22]Okay. You're going to click on okay and save. Okay, let's run it again to make sure all of our settings are good. Let's make sure it's running good.

[18:31]And uh yeah, that's it. It should be good. So, let's see. Uh, Open AI. Uh, the output is this one. We have the business name, we have the description.

[18:41]We have the contact information, social media links. We have the USP, we have the ICP and it is marked as yes. This is good. It works perfectly fine.

[19:43]Okay. Let's come back here. Now, to be honest, the automation is completed. You can just change whatever you want to change regarding the fields right in the prompt, okay?

[20:26]But since you're watching my video, I want to make you a pro at this. So I'm also going to tell you how you can add error handling to your automation so that if sometime it, you know, hits a snag, it does not stop, okay?

[20:41]So let's do that. So the very first error handler would be to set up a filter here to make sure that we are getting something from the Google sheets module, okay?

[20:53]So we want the total number of bundles or you can say the total number of outputs to be not equal to zero, okay? Only if they're not equal to zero, this automation should continue.

[21:03]If it's zero and um there were no, you know, links to process, then this automation should stop here. Because there are going to be instances where all of your links have been processed and you don't have any new links, okay?

[21:20]So in that case, why do we want to run the automation, right? It's just going to get you an error. So stop it on the first module itself by saying that the total number of bundles should not be equal to zero. It could be one, it could be 100, but not zero.

[21:36]The next point would be to set up a filter here saying that the data exists. A lot of times um there might be a snag and you might not get the data, right?

[21:49]So we don't want that because in that case, you're going to get errors from all of these, okay? So we would just rather stop the automation if we get, you know, an error or we don't get the scraped code from the website.

[22:02]Okay? So just put data exists. Okay? Another thing would be to add an error handler to this module. This module might get you errors.

[22:15]So maybe let's say there was a website that could not be scraped. Maybe that website did not exist, okay? So in that case, it would give you an error and the automation would stop, but we don't want that.

[22:24]We simply wanted to ignore that particular link if it does not exist or if it's giving you an error, and move to the next one.

[22:33]As simple as that. In that case, it might just skip that particular row and keep moving with the next one. If that makes sense. So that is the second error handler that you would want.

[22:47]And that's it. You just auto-align all of these modules and click on save. Now, your automation is bulletproof.

[22:55]Even if you get errors from any of the websites, it's not going to stop your entire operation. It is going to work just as flawless.

[23:03]And trust me when I said it, I have set up so many web scraping automations for my clients and all of these error handling and everything I've learned it all from my personal experience, from all the errors that I got um while working.

[23:17]But since you guys are watching my video, I don't want you guys to suffer and that's why please have these three error handling modules in your automation.

[23:27]Well, the job is done. One last thing. If you're processing a lot of records, then consider adding a sleep module here, so go to tools, go to sleep.

[23:38]And we would want the automation to sleep for 30 seconds after each processing. The reason why is that Open AI gives you an error when you're doing a lot of processing um, you know, quickly.

[23:52]So we might just want the automation to sleep 30 seconds, then carry on with the next record, sleep for 30 seconds, carry on with the next record.

[24:01]That's up to you. Only if you have a lot of websites or a lot of links to scrape. If they are less, then then you don't need it. But otherwise, it's always good to have a sleep module here.

[24:12]And save. Now it is more than ready to get into the real world. Now, remember I talked about the scheduling thing?

[24:21]So let's schedule the automation. Let's say that you um would want the automation to run every single day at 5:00 p.m.

[24:31]Look for any new links in the Google sheet and scrape then, scrape them. So you're going to click on this one and then you're going to select every day at 5:00 p.m. I think this is 5:00 p.m. Yeah.

[24:43]And then the automation is going to run every single day at 5:00 p.m. It's going to look for any of the records where the processed status is not yes, that means it's no.

[24:55]And it is going to process them. So just make sure that um all the records that you want to process are as no, right? That's it.

[25:34]And then the next day, when you look at your Google sheet, you will have all of the scraped data right there. Can you even believe it? It's all magic.

[25:42]So, this is it for the video. I will make sure to share the resources in the description down below. If you have faced any issues, then let me know in the comments below.

[25:52]Apart from that, I also forgot to tell you guys that apart from running my own AI agency, I'm also an AI automation consultant.

[26:00]So, if you're looking for consultancy or coaching regarding how you can automate your manual workflows, or let's say you're stuck somewhere or you want to improve something, then feel free to book a session with me. The link would be in the description below.

[26:15]And since I already have an agency, if you want to outsource projects, um let's say you would want to get something built out um for your business or for your agency, then again, I have the link to my website there.

[26:27]You can book in a call with me. We could discuss the project and if it looks like a fit, we could take it on and um automate your workflows.

[26:34]Also, if you would like to connect with me personally, then I have the link to my LinkedIn in the description. You can send me a connection request and we can connect on LinkedIn. I always have my eyes on the comment section, so if you've faced any issues, please do let me know.

[26:48]And yeah, don't forget to use um the link in the description below to create your make.com account. So that's it for the video.

[26:56]I hope you got value and this was valuable to you in some way or the other. I will meet you in the next video. Till then, bye.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript