Thumbnail for Lecture 1: Building LLMs from scratch: Series introduction by Vizuara

Lecture 1: Building LLMs from scratch: Series introduction

Vizuara

16m 3s2,365 words~12 min read
Auto-Generated

[0:05]Hello everyone. Welcome to the introductory lecture in this series, which is titled build a large language model from scratch. My name is Dr. Raj Dandekar. I graduated from IIT Madras with a B.Tech in mechanical engineering in 2017. After that, I did my PhD from the Massachusetts Institute of Technology, MIT. I graduated from MIT with a PhD in machine learning in 2022. Since then, I've come back to India and we are all on a mission to basically make AI accessible to everyone. Uh at Vira, which is our YouTube channel, we have made several playlists now on machine learning, on deep learning, and uh the main approach or the philosophy, which we really follow and we teach, is to teach everything from the basics, not to assume anything. Teach you the nuts and bolts of every single concept. The real reason behind this series is that, as we all know, large language models and generative AI are transforming everything around us. Startups are forming in this space, companies are switching to large language models for various tasks, the jobs in this field are rising a lot. But many people when they are learning about this field are making one common mistake. They directly switch to the applications. They directly run some Google Collab notebook, they follow some YouTube video and directly run a code. Uh, but very few people really understand how large language models actually work. Very few people actually know the nuts and the bolts, which make large language models really so powerful. Wouldn't it be amazing if you can build a large language model completely from scratch? Wouldn't that make you feel very confident about this subject? Very few people have this knowledge right now, and I'm making this YouTube playlist, which will be a very comprehensive playlist showing you everything about building an LLM from scratch. The way I'm making this playlist or the way I will make videos in this playlist is to teach you everything from the basics as a beginner without assuming anything. And at the end of this playlist, you will have built an LLM from scratch successfully all by yourself. You will see that after this point, everything which comes later, all the application parts, everything will just start seeming extremely easy to you. So that's the whole philosophy behind making this lecture series. It takes a huge amount of effort on our part to make this series, because as you will see, and I'll show you in some time to make every lecture, we are going to make detailed lecture notes. I'll share those lecture notes with you, and all the videos in this series will be available completely for free. Okay? So now I'm going to tell you a bit about my story of learning LLMs. Then I'm going to tell you a bit about what exists right now, the material which is already available on the internet for learning LLMs, why it is so inadequate, why it is so insufficient, and what we are trying to do new with this playlist. So let's get into the video. So let me take you back in time a bit to show you how large language models looked like in maybe 1960s. Uh or where the field of natural language processing was really about 50 to 60 years back. So this is one of the first chatbots which humans developed. It's called as Eliza. It was supposed to be a therapist, and let's see how it works. So first, this interface asks me to choose my language, so I'm choosing English. Let's go and so then Eliza tells me, how do you do? Please tell me your problem. So then let me type I am trying to learn about large language models, but I am finding it difficult. Could you provide some resources for me to start with?

[4:43]Then Eliza asks, is it because you are trying to learn about large language models that you came to me? Then I say yes. And then Eliza says, you seem to be quite positive, and then I say, yes. And then Eliza says, you are sure. You see, this conversation is proceeding nowhere. This was the state of large language models 50 to 60 years back, 50 to 60 years back. It's not very good, right? Fast forward to chat GPT, and you ask, what are large language models? Or let me ask the same thing which I asked over here. I am trying to learn about large language models. Tell me some resources.

[5:33]And then you will see the response by chat GPT. It's extremely useful, it's to the point and uh it gives me books, it gives me online courses. It gives me research papers, this is exactly what I need. So you see, this simple illustration shows that we are living in an age where we should be very lucky that the research on natural language processing and large language models is at such a stage where LLMs such as GPT are very powerful, they are very sophisticated. If you are not familiar with Chat GPT, that's fine. We are going to be building our own GPT in this playlist. So you'll learn about it along the way. But I showed you this demonstration for you to appreciate the times we are living in right now. LLMs have become really powerful and uh that's the first motivation to really learn about them. There are several more things which are happening as I'm making this video. Facebook released their Lama 3.1, which is one of their most capable large language models up till date. And this is an open source model, which means that the entire architecture of the model is available for free. Uh or rather it's available to the public. Anyone can see the architecture. The models released by Open AI are usually closed source. Which means they don't release the weights, uh the architecture, not too many things are known about the model itself. So here is a graph which shows closed source versus open source models. In 2022, when the field of large language models was booming, most of the things were pretty closed source. When GPT 4 was released in 2023, it blew the world away. Everyone was surprised, everyone was happy to see the functionality, but still it was a closed source model. Now in 2024, can you see the gap between the open source and closed source models slowly decreasing? And when Lama 3.1 is released, you can see that it performs at the same level as GPT 4, which is closed source. This is to say that all the information which you need is available right now as open source models. You should just be willing to learn. If you are not sure what's open source, closed source models, what's this Lama 3405b? I'll explain to you about all of these things as we proceed with the lectures. Now, text is one thing, right? There are many other things which generative AI as a field is capable of. There is also a lot of confusion with many people regarding what is generative AI, what are LLMs really. But generative AI is a broader subset, and it includes language, it includes video, audio, 3D models, all the things. So have a look at some of these videos. Don't these videos look incredibly realistic? This video, then this video, the waves video, this particular video. You'll be surprised to know that all of these videos are made by uh artificial intelligence. These videos are not shot on camera, they are made by AI. This is the power of generative AI currently. Finally, uh when we work with schools, we have developed our own AI application. So this is Visual's AI application, uh or application on LLMs. Here you can see that there are a huge number of functionalities. For example, you can click on MCQ generator, and you can just type in the topic, let's say Gravity, and you can click on Generate. Now you'll see that within a matter of seconds, the large language model which is powering this application will generate a set of multiple choice questions, right here. We are providing this application to teachers. All of this was impossible to even imagine uh just four to five years back. But now, large language models and generative AI are changing every single thing around us. In fact, what is probably more relevant to all of you watching this video is the global generative AI job market, and just look at this growth, it's incredible. Uh and this is the projected job market, it's expected to grow about five to six times in the next five years. Generative AI, large language models is an extremely useful skill, and the uh need for this skill is only going to increase in the future. So this again brings me to the question that if someone wants to learn about large language models, how do they go about doing this? So let's say you go to Google and you search build LLMs learn. We say which means you want to learn about large language models, there are a number of courses which show up really over here. Now, if you go to many of these courses, you'll see, let's say build LLM apps. It's about app development. It does not teach you how to build a large language model from scratch. Here is another course, master's master large language model concepts. If you look at this course description, right, they don't teach you how to build an LLM from scratch at all. They don't teach you the nuts and bolts. It's a pretty quick course. This is also not what I'm looking for. What I'm looking for is one course which teaches me the foundations in a lot of detail and in a lot of depth. I want to know about the foundations because I want to build an LLM from scratch so that uh my skill set is improved and so that I feel confident about giving generative AI and even LLM job interviews. This is what I'm looking for, really. I don't want a quick crash course, I want something in depth. Then I go to YouTube and search about LLM from scratch. This is the first video which shows up or rather it's the first playlist. You'll see that there are 18 chapters in this playlist, but each of these chapters is again only 10, 15 minutes. And again I'm a bit demotivated seeing this because I am not looking for this either. I am looking for a massive deep technical course which teaches me how to build an LLM right from the very basics. There is Andre Karpathy's building a GPT course, but if you look at this course, it's actually quite complex. It's not an easy course at all. He starts right in the middle of uh a concept and it's not meant for beginners, and it's just 90 minutes. Again, this is not what I'm looking for. Here is another create a large language model from scratch, but you see the red bar here. I tried taking this course and I finished three hours of this. But again, this is not explained well, it's not explained from scratch. I want to make a course which people understand right from the very basics, and none of the courses on YouTube, on Google are satisfying that need. Luckily, very luckily, I came across this book by an author called Sebastian Raschka. I think it's one of the best books on large language models. I purchased this book, and this is going to serve as the reference material for our course. What I'm going to do is that this is a 408 page book. I'm going to convert every single aspect of this book into a set of videos. I'll probably have 35, 40, or maybe 50 videos in this playlist, similar to my playlists for machine learning and deep learning, and whatever is given here, right?

[13:24]I'll convert it into uh I'll convert it into video format. So, for example, here are some of the notes which I've already started to make. We'll, we'll start covering this when we move to uh the next lecture. Look at these notes. What I've done over here is that I've built my understanding from this book over here, and to to transfer my understanding to all of you, I have started writing on a whiteboard every single thing in detail. Look at this and I'm trying to make it as interesting as possible and as fundamental as possible. So this is going to be the next lecture. Uh Intro to LLMs. So I've finished making the notes for this, and I've finished making the notes for then lecture after that as well, stages of building an LLM. So basically I'm making, I'm in the process of making all these notes, but I'm making the videos simultaneously so that I am also motivated and I'm also on track. Eventually this will be a massive lecture notes of maybe 200 to 300 pages, and uh there will be 50 to 60 videos based on this. You'll learn everything right from the very basics, nothing will be assumed, everything will be spelled out, and these set of videos will be perfect for serious people who want to transition to large language models, really, and want to understand LLMs the right way.

[15:18]Uh, okay. So this is the main idea for the course. Another thing which I want to say is that you might have heard of applications like Lang chain. This is a tool which helps you build LLM apps, but again it's not useful if you don't know how to build an LLM yourself. Many students directly start deploying apps, as I mentioned before, but I personally don't think that's the right way to learn about LLMs. You need to know how to build an entire large language model from scratch, and we'll teach you that in this playlist.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript