Thumbnail for How to Scrape ANY YouTube Video Transcript With Just AI! (FREE) by Owain Lewis

How to Scrape ANY YouTube Video Transcript With Just AI! (FREE)

Owain Lewis

20m 47s3,516 words~18 min read
YouTube auto captions
Transcript source

YouTube auto captions

This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Timestamped outline
Pull quotes
[0:00]Getting YouTube transcripts can be surprisingly difficult and many people have to rely on third-party tools and services like Apify to scrape YouTube transcripts.
[0:00]Thankfully, there's a new way that not many people know about that I'm going to show you today.
[0:00]Google Gemini is a powerful multimodal model from Google that can process YouTube content but without needing third-party APIs.
[0:00]I'm going to walk you through step by step how to process YouTube videos within your NA10 workflows using only AI, so let's get started.
Use this transcript
Related transcript hubs

[0:00]Getting YouTube transcripts can be surprisingly difficult and many people have to rely on third-party tools and services like Apify to scrape YouTube transcripts. Thankfully, there's a new way that not many people know about that I'm going to show you today. Google Gemini is a powerful multimodal model from Google that can process YouTube content but without needing third-party APIs. I'm going to walk you through step by step how to process YouTube videos within your NA10 workflows using only AI, so let's get started. The first thing I want to show you is basically how this automation works and how you can use it for your own content generation. So what we have here is a simple workflow and what it's going to do is based on some configuration including a YouTube channel and my API key. I'm going to go and get a list of videos from YouTube and for each of those videos, I'm going to process the video and get a transcript using Google Gemini. And then I'm going to save that into an Air table record. Let's take a look at this in action. Okay, so I already have an Air table database set up. It has a couple of columns, we have video ID, we have the channel name, we have a title, and then we have a transcript as well. And so what we're doing now is we're kicking off a workflow that's going to populate all of these fields for us based on the channel ID that we gave the automation. Okay, so this workflow executed successfully and what it's done is it's gone ahead and using my own YouTube channel. It's given the channel name and it's pulled out the two latest videos, so we have the title and then we have the transcript here. Which is the entire YouTube video transcript. We can customize this in any way we want, we can also generate summaries or content ideas. From this transcript, um, but this is just for demonstration. So you can actually go ahead and use this within Gemini. So if you haven't checked out Gemini, I definitely recommend it. You can go to Google AI Studio, which is Aistudio.google.com and you'll see here you're able to select a bunch of different models. So most of these models don't support YouTube transcripts. But if you use the latest Google Gemini 2.5 Pro preview model, you'll see here that there's an option to pass in a YouTube video. So what we can do is we can go ahead and get any video. So I'm going to pull this one. I'm going to copy the link address. And what I'm going to say is, please transcribe this video. All right, so as you can see, we're actually getting back the transcript of the video directly from Gemini. So we're not using any third-party tools, we're just asking Google Gemini, which obviously has access to this data. Um, we're just asking for the transcript. It can take a while to process. But what you'll get back here is the full transcript of the video. All right, so let's take a quick look at our game plan for today. So we're going to build a YouTube channel scraper. The requirements we have are to not use any third-party tools. So the game plan, first thing we're going to need is the Gemini API key, and then we're going to need to make a custom request. To Gemini in order to use this feature, it's not available in the in the standard N N L L M node because we're going to use um a different feature of the API. There are a few things you need to know about using these APIs and the first one is that these APIs have token limits. So there are two models that we can use uh to get YouTube video data. The first one is Gemini 2.5 Pro and the other one is Gemini 2.0 Flash. All right, so let's take a quick look at the the API docs. So you can see here, these are the latest Gemini models. We have 2.0 Flash, which is a faster model and we have 2.5 Pro. If you scroll down, you should be able to see that these models have token limits. So Google Gemini 2.5 Pro has an output token limit of 65,000. That means we're able to output 65,000 tokens. And if you scroll down to Gemini 2.0 Flash, you can see here that the token limit is much lower. So this is just one of the caveats you need to bear in mind if you're using this method. If you want the entire transcript and you have a very long video, this probably isn't going to work. If you have a shorter video or if you're generating a summary, this will work fine. So you want to be aware of these token limits, they're going to limit how much data you can return from the API call. Okay, so the next thing I want to quickly show you is the actual API call that we're making. We're making an API call to the Google Gemini API directly, using our API key and we're passing in the model. And what's interesting about this method is the actual way we're invoking the API. We're passing in file data, and we're setting the mime type to be video/mp4 and we're passing in this file URI, which is the YouTube video. This is the magic that enables you to use this feature and as I said, it's not currently exposed via the NA10 node, so you're going to need to do a custom HTTP call to get this working. Okay, and if you're interested, the Google Gemini API, because this is a multimodal model, it has a files API. This is what we're using here, so you can use the files API to upload and interact with media files. We're not uploading the YouTube file directly, we're actually just linking to it, but you can also upload files as well to to process them using this API.

[4:57]You can see here, it gives you some idea about using the um, the file data. So this gives you a reference for the API call that I was using, you're making a call to the Google API as application JSON. We're passing in the contents as this is the prompt, describe this audio clip and the file data, this is the magic, this is the important bit. We're setting the mime type to be um, an MP4, and the file URI is the YouTube video URI. So this is the really important bit, this is where you can get the uh, this is where you can get the contents from. All right, so let's take a quick look at what we're going to do right now. So we're going to firstly, we're going to build an N8N automation that creates a form, we're going to capture a YouTube video URL, we're going to use Gemini to process the video and generate a transcript. And then we're going to store that transcript into AirTable. So let's go ahead and build that. All right, so the first thing we're going to need is an AirTable database. I'm going to create this one from scratch. So we're going to create a new AirTable database. We're going to call it YouTube. Let's call it YouTube automation. We're going to delete these columns. We're going to start from scratch. All right, so the first thing we're going to need is our AirTable database. I have here a basic empty database. I'm going to change this table name to be YouTube. And I'm going to create columns. The first column is just going to be the YouTube YouTube URL. And then the second one is going to be the transcript. So we're going to have a long text field and this is where we're going to store the transcript. All right, so this should be enough to start for now. So what we're going to do here is we're going to go ahead and create a new workflow.

[6:47]The first step we're going to have is a form trigger. So we're going to create an N8M form and what we're going to do is call this form YouTube transcript. We're going to set the field name to YouTube URL. Let's go ahead and find a video. So this is my previous YouTube video. Let's go ahead and paste that in.

[7:14]All right, perfect. And as you can see here, now we triggered this workflow and we're able to see that YouTube URL. All right, so now that we have our form trigger, the next thing we're going to do is add our configuration. So we're going to create a set node. We're going to do edit fields and set and this is where we're going to add our configuration. So let's rename this to configuration. And then what we're going to do is we're going to add in uh all of the configuration we need. The first one is going to be the Gemini API keys, Gemini API key. And this is going to be your API key, and then the next field we're going to need is the Gemini model. There are a couple of different options you have for the model. You have either Gemini 2.5 Pro, or you can use Gemini 2.0 Flash. I'm just going to use Gemini 2.0 Flash for this example.

[8:03]All right, and now I can test the step. Obviously you're going to need to put in your own API key.

[8:44]And we're making a post request to this URL. We're going to make this dynamic because we're going to change the model. So this is where you can configure your model. So we're making a post request to the Google API V1 beta models and then we're going to we're going to change this.

[9:18]This allows us to change the model easily via configuration. And we also need this Gemini API key as well, and the value here.

[9:41]And then the final thing we need to do is just change the body of the request. I'm just going to quickly show you what this looks like. So the request that we need to make to the Gemini API to make this work, we need to pass in something called file data. So the parts, this is documented in the standard API documentation, this is basically the prompt we're making, transcribe the following YouTube video. Do not include timestamps. And then we're passing in file data, which is the mime type, video MP4, and we're also passing in a file URI. This is basically the video uh, the video URL that you want to pass in. So we're constructing this dynamically, we're using the video ID, um, but you can just paste in the full URL here. So let's copy this here, and this is what we're going to use as the JSON body.

[10:38]All right, so let's go back to our workflow. What we're going to do is we're going to paste this in here because we're actually passing in the video directly, we need to change this now in a second. We're actually passing in the full YouTube URL in our form. So I'm going to quickly edit this here, uh the file URI. Let's expand this out. What we're going to do here is just pass in, let's make this an expression and what we're going to do is pass in the YouTube URL here. So we now have our API key, we're making a custom API request, and we're passing in this is our prompt, transcribe the following YouTube video. Do not include timestamps. We're passing in the file data, this is the magic, where we're setting the mime type to be video MP4 and we're passing in the file URL as the YouTube video URL to to watch. And let's test this out. So we're going to start listening. I'm going to pull up the, I'm going to get the YouTube URL here. I'm going to click copy link address, go back to the workflow, we're going to click on test workflow. I'm going to paste in the video URL. And then this is sending a request off to Google Gemini. All right, looks like that worked correctly. If we expand out and have a look at what came back, let's take a look at what came back. We should be able to see here. This is the transcript. As you can see here, this is the text transcript that is generated, it looks like it generated all of it. Let's go down to the bottom. Thanks, thanks for watching. So we actually generated the entire transcript using Gemini 2.0 Flash. Obviously, if you have a longer video, you're going to use one of the different models, you're going to use Gemini 2.5 Pro, if you have a longer transcript. All right, so that's the basic automation. The next thing we're going to need to do is put this back into AirTable. So let's go ahead and update AirTable. We're going to say create a record. You'll obviously need to configure your access token for AirTable. We're going to say create. We're going to choose our table, which is going to be YouTube automation. And we're going to choose the table. It's going to be YouTube and we're going to paste in the YouTube URL and the transcript. So the URL came from the form, so this is the URL. And then the transcript came back from Gemini, which is here. Let's going to test that step. So what we should have now is this data in AirTable. As you can see here, now we have the YouTube URL and we have the transcript. Obviously you can expand this out. All right, so as you can see, we're now able to use Google Gemini to get full YouTube transcripts using their API. Other things we could do here would be to generate summaries as well. So let's take this automation one step further. So what we're going to do here is we're going to have a manual trigger. We're going to change this AirTable database slightly. We're going to add in some additional fields. So let's go ahead and change this. So we're going to add a manual trigger and what we're going to do is we're going to have our configuration up here. So the configuration now is also going to include a channel ID, so let's add a channel ID.

[14:23]All right, so let's get a channel ID. So the way you get a channel ID is if you click on any YouTube channel and you click on this more icon, you can click on share channel and then you can click copy channel ID. So once we have a channel ID, what we can do is paste it into here, and now I can test this out. And so this is our configuration. We're going to create a YouTube node and what we're going to do is um, get many videos. So we're going to click on the YouTube video action, get many videos, and you'll obviously need to authenticate with YouTube. I'm going to skip over that for now. We're going to have the get many operation, we're going to set a limit to let's say two videos. Actually let's make it three. Um, we're going to add a filter and we're going to say channel ID and then we're going to pass in the channel ID. This is basically saying to go to YouTube and then for this particular channel, get the three latest videos. In order to make sure they're the latest, we need to set the order and we're going to order by the date. And then if we just test this out, what we should see is uh here are the latest videos from my own channel. Okay, so now that we have a list of YouTube videos, what we can do is then we can pipe this into uh the HTTP request.

[15:51]So what we're doing for each video, we're going to make a request to get a transcript. Let's change this now slightly. What we're going to do, let's sorry, let's quickly update this because we've moved things around. So this is going to be the configuration. item.json.gemini API key. All right, so what we're going to do is we're going to quickly change the prompt here. Um, and obviously we need to change the file URI now because it's slightly different. So let's actually change the file URI. This is going to be, um, we have the video ID now coming back from YouTube, so what we need to do is construct this dynamically. So the way we're going to do that is if we copy the YouTube URL, copy link address, and we go back into our workflow, what we can do is just change this ID at the end. So the video ID, essentially when you look at any YouTube video, you have the YouTube.com, watch and then this question mark V equals. And what we want to do is paste in the video ID here. So we're going to change this to be the video ID and we're going to copy this across from here. Video ID, we're just going to put it into here, and as you can see, this expands out to be the correct URI. And let's just do summarize the following. So summarize the following YouTube video. Do not include any time stamps. Note, the, okay, that'll do. Let's keep it simple. So just summarize the video essentially. Okay, so this is all working.

[17:36]So now we should have the ability to for any channel, get a list of all of the YouTube videos. We're then going to go and get a summary of the video and then we're going to store the data back in AirTable. Before we do this, let's just run this once. So as you can see, three items came back here. So we're going to process the three latest videos from my channel and generate a transcript for each of those videos. What we probably want to do is go ahead and update the AirTable database while we're waiting for this. So let's change the structure here slightly. So let's edit this field to be the YouTube ID. So the video ID. It's going to be a single line text. Let's just delete this record. So we're going to set this to be the video ID. We're going to have a transcript. We also want um the channel. So let's do insert left, single line text, let's call this channel. This is going to be the name of our channel. Um, we want the transcript. Let's call this transcript or summary.

[18:49]And that should be it for now.

[18:57]All right, so this is completed successfully. We now have the videos. You can see here, this actually processed a little bit quicker. So this has done a summary of each of the videos and now what we want to do is push this back into AirTable. So we're going to reconnect this node and we're going to double click here and let's get rid of all of these things. Let's refresh the columns because we just changed that. So the video ID, let's change this to be create or update. So when we run this automation, if we already have the data in AirTable, we don't want to update it again. So what we're going to do is create or update. We're going to match on the video ID and the video ID is going to come from YouTube. So this is the video ID. The channel is going to be the video channel name, which you can find down here, channel title. And the transcript or summary is going to be what we got back here, which is the summary of the YouTube video. All right, perfect. So let's just quickly test this step. And if we go back into AirTable, you can see now it populated both the video ID, the channel name, and also the transcript and summary here. So this video is about uh, using AI agents in Slack. So it's done a pretty good job at summarizing. Obviously, this is just a demo to show you how you can use Google Gemini with N8N to summarize YouTube content. You can generate transcripts, you can generate summaries, you can also generate content ideas. Basically you can do whatever you like with the content of that YouTube video. I hope you enjoyed the video. If you did, remember to like and subscribe. Feel free to ask me any questions you have in the comments below, and I'll see you in the next video. Thanks for watching.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript