Thumbnail for Lecture 2: Large Language Models (LLM) Basics by Vizuara

Lecture 2: Large Language Models (LLM) Basics

Vizuara

35m 40s5,430 words~28 min read
Auto-Generated

[0:05]Hello everyone. Welcome to the second lecture in the building large language models from scratch series. The previous lecture was an introduction to the entire series, and in that lecture, I described about what we are planning to cover in this playlist, and I mentioned about our end goal of learning the nuts and bolts of building a large language model. In this series, we are not just going to look at applications. We are going to understand how things really work from a very, very fundamental level. So let's get started. My name is Dr. Raj Dandekar, and I graduated with a PhD from Massachusetts Institute of Technology in 2022. Since two years, I've been in India working on our vision to make AI accessible to all. This playlist is a part of that vision. So let's get started with today's lecture. Today we are going to cover six major aspects. First, we are going to learn about what exactly is a large language model. Secondly, we are going to learn about what does large mean in the large language model terminology. Third, we are going to see what's the difference between modern large language models and earlier NLP models. Fourth, we are going to see what's the secret sauce behind LLMs. What really makes them so good? Fifth, we are going to see the difference between these terminologies which you might have heard. LLM, generative AI, deep learning, machine learning, artificial intelligence. It's getting so confusing, right? The number of terminologies just keep increasing day by day. How do we know which terminology means what? And then in the last section, I'm going to discuss about the applications of LLMs and what all cool stuff we can build with LLMs. So with this we'll get started with the part one or section one of today's lecture, which is explaining what is a large language model in the first place. Now, unless you have been living under a rock, it might, you would have definitely heard of LLMs, open AI, generative AI. So you might have heard about the term large language model, but you might not be knowing what it exactly is. You might have even used chat GPT without knowing what an LLM is. At its simplest form, an LLM is just a neural network which is designed to understand, generate and respond to human like text. There are two key things here. An LLM is a neural network which is designed to understand, generate and respond to human like text. Let me explain the two parts separately, first neural network. So I'm assuming people who are coming to the series know a bit about neural networks. If you don't, it's totally fine. I'll explain all those basics as well, but a neural network essentially looks like this. We have input data which feeds into a layer of neurons and bunch of these layers are stacked together and we have the output layer. The reason it's called a neural network, it's because these neurons represent the circuitry in our brain. or they do so in a very symbolic sense. that's why this name deep neural network comes. And deep neural networks are shown to have a huge number of applications in image detection, text generation, self-driving cars, then let's say if you want to detect whether patient has brain tumor or not, they have huge number of applications. So LLMs are essentially just neural networks which are designed for very generic type of text related applications, such as understanding, generating and responding to human like text. What does it mean to understand, generate and respond to human like text? So the simplest way to demonstrate what does that mean is taking you to Chat GPT. I am sure all of you who are seeing this lecture have used Chat GPT. Here you can see on the dropdown, I am using GPT 4. You might be on GPT 4 O or GPT 4 O Mini. So you can ask basically any question to Chat GPT. See, here there are some questions and or here there are some prompts. So I'll click on this prompt and click on plan a relaxing day. And you see, Could you help me plan a relaxing day that focuses on activities for rejuvenation? To start, can you ask me what my favorite forms of relaxation are? Absolutely! To help craft the perfect day of relaxation and rejuvenation, could you share your favorite forms of relaxation? For instance, do you prefer spending time outdoors, engaging in activities like reading or yoga, enjoying spa treatments, or perhaps trying creative pursuits like painting or cooking? So I'll write here reading a book. Let's see what Chat GPT responds. And then Chat GPT says great! Reading can be a central part of a rejuvenating day. Here's a plan focused around your enjoyment. And then it tells me to do morning meditation for 30 minutes, healthy breakfast for 45 minutes, reading time for two hours, nature walk for one hour. So you can see it's almost like I'm talking with a human. So that's the second part of the LLM definition. Large language models understand, generate and respond to human like text. And they also respond like humans. So Chat GPT, the demonstration which I just showed you is an LLM. But what many people don't know about or they don't think about LLMs is that at the core of LLM, they are just neural networks which are designed to do these tasks. So if anyone asks you what an LLM is, tell them that they are deep neural networks trained on massive amount of data which help to do specific tasks such as understanding, generating and respond to human like text. And in many cases, they also respond like humans. This is getting better and better as I'm making this lecture and LLMs are in fact sounding more and more like humans and that has led to huge problems. Maybe we'll create a separate lecture series on that. So that's the first section which we dealt with today, which is what is a large language model. Great. The second question you might be thinking is why is it called a large language model? Why not just a language model? After all, it's just a model dealing with language, right? Why do we specifically have this one more large term here? And the reason is because until LLMs came into the picture, model sizes were not very big. And when I mean model size, I mean the number of parameters in the model. But LLMs, if you, if you know about the number of parameters which they deal with, you'll be shocked. LLMs typically have billions of parameters, billion, one billion. It's a huge number of parameters. And now there are even some LLMs which are trillions of parameters. We can't even think of those numbers. But it's huge number of parameters. So let me show you a visual here. So this is a table which shows the number of parameters in GPT. As you know, we are at GPT 4 right now. So GPT 3, which was the earlier version of GPT, you can see GPT 3 small had 125 million parameters. GPT 3 medium, 350 million. GPT 3 large had 760 million. And if you see GPT 3 13B had 13 billion. GPT 3 175B or GPT 3 had 175 billion parameters. So usually this second term here is the number of parameters in the model. So here it's 175B, which is 175 billion parameters. And GPT 4 has even more number of parameters. That's why they are called large language models. So here is another graph which shows the comparison between GPT 1, GPT 2 and GPT 3. So as you go from GPT 1 to 2 to 3, the LLMs also become better and better and better. But their number of parameters also increased tremendously. If you see from GPT 1 to GPT 2, there is a factor of 100. So the parameters increased from 100 million to almost 1 billion. So actually there is a factor of 10 here. But from GPT 2 to GPT 3, there is a factor of 100. So GPT 2 had only 1.5 billion parameters, but GPT 3 had 175 billion parameters. You see there are some other terminologies also like GPT 1 had 12 decoders. GPT 2 had 48 and GPT 3 had 96. If you don't know what this means, it's totally fine. We are going to cover that in this series. Token size, take a look at token size. GPT 1 has a token size of 512. GPT 2 has 1024. GPT 3 has 2048. It's fine if you don't know what it means, just, you can see everything is generally increasing. Now, here is a graph which shows the model size in artificial intelligence type of models over the period of years. So this graph is plotted from 1950 to 2022, and it was published in Nature. So it is a reliable graph. On the Y-axis, there are number of parameters. So if you see in 1950, number of parameters was around just 10 to 100. in a log scale, so this Y-axis is in log scale. So 1950 to 1960, models barely had 1000, 10,000 parameters. Then in 1980 to 2000, model started having around 100,000 parameters. But it's only in 2020 that we have reached 100 million, 1 billion and even almost close to 1 trillion number of parameters now. And you can see the, uh, orange symbol here. So mostly these parameters are for LLMs, which is for language tasks. I'm just zooming in here to show you that the orange symbols here have the largest number of parameters. I hope this has convinced you why these models are called large number of models. We are living in an age where the number of parameters are unprecedented, they are huge number. That's why it's called as large language models. I'm sure by the time this video is released and maybe one year after that, you would have reached the parameter size of 1 trillion. Now, the reason that's the first part of the terminology, large. Then why are they called language models? And that's pretty clear if you remember the example which I showed you over here. These models only deal with language, they do not deal with other modalities like image or video, let's say. They only deal with language. LLMs only deal with language. So these models do a wide range of natural language processing tasks, question answering, translation, sentiment analysis and much more!

[11:38]So this is the meaning behind the terminology large and language models. That covers the second aspect of today's lecture. And now we come to the third aspect, which is why have LLMs become so popular? Didn't we have the field of natural language processing before also? And that's true. Natural language processing as a field has been existing much before LLMs.

[12:06]NLP models have been existing much before LLMs. But NLP models if you look at the models which came before LLMs, they were designed for very specific tasks. So for example, there is one particular NLP model which might be designed for language translation. There might be one specific NLP model which might be for sentiment analysis, let's say. LLMs on the other hand, can do a wide range of NLP tasks. So it turns out that if you train a GPT for text completion, that same architecture works well for language translation also. And it's pretty generic that way. That's one difference between LLMs and earlier NLP models. Earlier NLP models were designed for specific tasks. Modern LLMs can do a wide range of NLP tasks. The second difference is based on an application. So earlier LLM models could not write an email from custom instructions. It was very difficult for them to do that. Now, if you think of modern LLMs such as Chat GPT, and this task is almost trivial for Chat GPT. So if I go to Chat GPT right now and I say Draft an email for me to send to a friend of mine to book movie tickets. And I'll just zoom in here and see, Chat GPT not only starts drafting the email, but it also has inserted emojis and everything here. It's a trivial task for Chat GPT. It's a trivial task for Chat GPT. But you might be surprised to know that this trivial task was very hard to do for earlier NLP models. And that's the big difference between LLMs versus earlier NLP models. They are broad, they can do a wide range of generic tasks. And second is their applications are endless, much more than the earlier models. And that's why we are even having this playlist in the first place. So that's point number three. Point number four is what you all must be thinking about, that LLMs are so good. They can do these amazing tasks, they can almost behave like humans, right? But what makes LLMs so good? What's the secret sauce? Typically, there has to be a secret source, right, which makes LLMs so much better than NLP. What's that magic bullet here? And usually people say that there is no secret sauce to things, things just gradually improve over a period of time. But in this case, there is a specific secret sauce, and that is the transformer architecture. So you see here I've added the logo of secret sauce. And for LLMs, the secret sauce is transformer architecture. Don't worry if you know what it, what it means, what is transformer. For all you know, you might be thinking of the Transformers movie in which there are these cars which get converted into mechanical robots. So you, that might be your first thought when you hear about transformers, right? Of course, people who know about this know exactly what I'm talking about. But for people who don't know, this, this is the only transformer, if this is the only transformer you know, this playlist is perfect for you. Because you will understand what, what does transformer really mean? And what it actually means is it's summarized by this one schematic over here. This is what a transformer architecture looks like. And you might be confused by all of these terminologies. What is input embedding? What's multi attention head? What's feed forward here? What does add and norm mean here? What's output embedding? There are so many things which look confusing and it's fine. But we are going to learn about this secret sauce in a lot of detail in this playlist. There is one paper which was introduced in 2017 which really changed the game. That paper is what I'm showing you on the screen right now. It's called Attention is All You Need. It was published by eight authors from Google Brain. And this introduced the architecture of Transformers. So the schematic which I showed you on the whiteboard is this schematic which they had in the paper. Can you guess how many citations this paper has today? It has more than a 100,000 citations in just a matter of five years. So people say that research is boring and research does not have any external reward. In this case it does. If you are one of the few authors on this paper, you have 100,000 citations in five years and you completely revolutionize the field of artificial intelligence. So this is that paper, and it's a 15-page paper. But if you try to read it, it's very dense. To understand this paper really takes a lot of time and it really takes a lot of effort. There are some YouTube videos to explain this, but I don't think any of them do justice to really explaining everything from scratch. Every single page of this paper can be a three to four videos. So if you think about it, these 15 pages contain huge amount of information, and we'll be devoting lectures to different sections here, such as positional encoding, what is this dot product attention? What is this attention formula? What is key query value? What is multi-head attention? We'll be figuring all of that out. So don't worry at this stage if you don't know what a transformer means, even if you have just this image in mind of a transformer, it's awesome. You've come to the perfect place. Okay. So, as I said, do not worry. We will learn all about this secret sauce, which is the Transformers in the subsequent lectures. That covers the fourth point of today's lecture. And now we come to the fifth point, which is all about terminologies, right? So there are so many words people are throwing around randomly these days, right? It feels, it's just getting confusing, more and more confusing. So people talk about LLM, people talk about Gen AI. They talk about deep learning, they talk about machine learning, and artificial intelligence is one more term. What are the similarities? What are the differences between all of these terms? So if you look at this particular graph here or figure, it kind of sums up everything perfectly. Start from the outermost sphere or the broadest domain, that is artificial intelligence, that is the biggest umbrella. Under this big umbrella comes a smaller umbrella, which is machine learning. Under this machine learning comes yet another smaller umbrella, which is called deep learning, and within that, the smallest umbrella, that's the large language models. And this is what we are going to make a playlist on or this is the playlist which you are watching right now. So what's the differences between these? So first, the broadest umbrella is artificial intelligence, and this contains all the other umbrellas. So any machine which is remotely behaving like humans or it has some sort of intelligence, that comes under the bucket of AI. That's the biggest bucket. So you might be thinking what's the difference between AI and ML, right? So look at this example. This is Lufthansa flight chat assistant. And so it says Hi, I'm Elisa. I'm your Lufthansa chat assistant. Here is a selection of topics I can help you with. You can choose one of the topics. So then you say hi, Elisa. Then Elisa says hello. Then you can click on any one of these. You see it's a rule-based chat bot. You click some options and then Elisa is already programmed to answer. So I say that flight canceled, check alternatives and rebook. So if I click on this, Elisa is already programmed to answer. This is an example of AI because it covers intelligence, right? Lufthansa chat assistant can be thought of to be intelligent. But this is rule-based intelligence. It's not learning based on your responses. So let's say you respond and your friend responds. Elisa, this chat assistant will behave the same way. It's not learning based on what you are giving it or your specific nature as a user. So that's why this is not an example of machine learning. It's an example of AI, but it's not an example of ML. And that's why AI comes under the broadest or the biggest umbrella. Within that comes ML. If you look at ML, this is basically machines which learn, they adapt based on how the user is interacting with them. So you might be thinking, okay, that's fine then what's the difference between ML and DL? Why is DL a subset of ML? The difference is that deep learning usually only involves neural networks. But machine learning involves neural networks plus other fields. And one such field is decision trees, right? Let's say if you want to predict heart disease. So if you have data from 303 patients and if you are collecting things like age, gender, chest pain, cholesterol level, ECG, etc. You want to predict whether the person has heart disease or not, and you build a decision tree like this. Now, a decision tree like this does not have neural networks at all. This is completely detached from neural networks. So this is not an example of deep learning, but it is an example of machine learning. A deep learning example would be something which involves a neural network. So for example, here we are, this is an image detection. And here's a convolutional neural network. So we have a coffee cup here and we are detecting based on these filters. We are so there are filtering layers in this convolutional neural network, and at the end of this filtering layers, we are detecting what this input is. Whether it's a life boat, whether it's a pizza. So with maximum probability, we are saying it's an espresso. So a neural network is being used for this task. That's an example of deep learning. Another example is let's say if you want to predict hand written digit classification, right? So if you train a neural network like this, if you start training, so here you can see the neural network has been trained and the problem here is handwritten digit classification. So I've given it a bunch of digits to learn from, and the task of this neural network is whenever I give it or whenever I write a new digit, it should identify whether what digit is it. Now, if you see this particular example, this is actually an example of neural network. So here you can see the neural network architecture. We have the input layer, we have the hidden layer, two hidden layers, and then we also have the output layer over here. This is an example of deep learning. So now if you click on test and if I select this, let's say I select this, and click on predict. You see the neural network is correctly predicting it. So it's understanding the digits. Let's say I click on this and predict. It's correctly predicting seven. This is an example of deep learning. That's why deep learning is a subset of machine learning. Machine learning involves these kind of neural network architectures, plus it involves things which are not neural networks, like decision trees. Now, within deep learning is LLMs. Why? Because deep learning, as we saw, involves images also.

[23:42]But large, like this involved images, even this project involved images. But large language models do not involve images, they only deal with text, and that's why they are a further subset of deep learning. This is the difference between AI, ML, DL, and LLM. So then you might be thinking what is generative AI? Generative AI you can think of as a mixture of large language models plus deep learning. Why? Because generative AI also deals with other modalities like image, sound, video, etc. It deals with all of those modalities. So it is definitely, you can think of it as a mixture of large language models plus deep learning. So if someone asks you what is generative AI, basically we are using deep neural networks to create new content such as text, images, various forms of media. Whereas in LLM, you only deal with text. So generative AI can be thought of as a mixture of LLM and deep learning. I know these terms are confusing, but I hope these examples help you understand what's the similarities and differences between these terms. As a summary, you can think of as AI to be the broadest umbrella, within that comes ML, within that comes DL, and within that comes LLM. And AI is artificial intelligence. ML is machine learning. DL is deep learning, and LLM is large language models. And if you mix LLM plus deep learning, then we have generative AI. Because in the field of generative AI, we don't just deal with text, but we deal with other modalities of media like image, like audio, like video, etcetera. So LLMs just represent a specific application of deep learning techniques, leveraging their ability to process and generate human like text, which we have already seen. They basically deal with text. So this is the similarities and differences between AI, ML, DL, LLM, and generative AI. I hope you have understood this part and you like these visuals. If you have any doubts until this part, please write it in the YouTube comments and we will answer as quickly as possible. And now we come to the last part of this lecture, which is applications of LLMs. And as I speak the applications go on increasing. But overall, they can be divided into five main categories. LLMs as number one, they can be used to create new content. Of course, so if I write here write a poem about solar system in the format of a detective story. Maybe this content does not exist anywhere right now, but you can create this content using LLMs. So here you can see there's a poem about solar system in a detective story format. In the quiet sprawl of the Milky Way, a detective roamed the stars by night and day. His name was Orion, a sleuth well versed in cosmic affairs where mysteries burst. The case on his desk? A dance quite bizarre. Yeah, I won't read the full poem. But here you can see you have created new content. You can even write books with LLMs, generate media articles with LLMs, and people are already doing these things. Then you can use LLMs as chatbots. So you can actually interact with them as virtual assistants. So here you see at the start, the example which I showed you, it's like I'm chatting with this LLM, right? It asks me what's your favorite form of relaxation? I say reading a book. I can continue conversations with it. And not just for me, big companies, airlines, then hotel reservation desks, they need chatbots, right? Let's say you call a customer care representative. In five years, it's highly likely that all of those people are AI, AI chat bots. And how are they built? They are built through the knowledge of LLMs. In fact, through this lecture or through this playlist, rather, we will be building our own LLM. So you will be fully equipped to develop your own chatbot in any field which you, which you like. So chatbots is one of the huge and major applications right now of large language models. Banks, movie restaurant, movies, restaurants, all of them need chatbots. When you are booking shows, when you are dealing with the bank, you want to know how to open an account. All of these are going to be automated and are already being automated through LLMs. That's one of the biggest applications. And through this playlist, you will learn the skills to develop your own chatbot. Then the third application is machine translation. Which means that we can of course translate the text to any language, right? So then I, I say basically that translate this poem to French. And here you can see that Chat GPT is working on this translation and immediately translates it to French. We don't even need Google Translate now. I'm sure this translation is accurate. Because it's it's amazing at this translation tasks. So you can very quickly translate the LLMs output into any language and it does have some support for regional languages. Not a very broad support, but it does support few regional languages, for regional languages. In fact, a lot of research is being done to improve the LLM outputs for regional language. For regional languages. So that's point number three. Point number four, which we saw already is new text generation, such as writing poems, writing books, writing media articles, news articles. It can basically create new text for you which did not exist before in literature. And finally, LLMs can also be used for sentiment analysis. You can give a big bunch of paragraph and ask the LLM to identify whether, what is the sentiment here. This might be useful for hate speech detection, let's say on social media, like Twitter, Instagram, etcetera. So all of these five applications are the pillars of LLMs and what they can do. There are several more applications which I'm probably missing right now, but these five are the major ones. To illustrate these applications, let me just take you through a portal which we have recently created for school teachers. So here you can see we have created this portal by learning about LLMs ourselves. So here it's YouTube generator, MCQ generator, text summarizer, text rewriter, worksheet generator, you, we can do so many awesome things. Let me go to the lesson plan right now and say that I want to create a lesson plan on the topic of gravity, and I want to align it with the CBSE curriculum of India. And let me click on generate.

[30:50]So you'll see that the LLM is working on it a bit and it does take some time, but now you can see the lesson plan is created for gravity. Where we have the objective, we have the assessment, key points, the opening, the introduction to new material, guided practice, independent practice, etcetera. This would save so much time for the teacher and it's amazing. This was not possible five years back. But now with the LLMs, we can do all of these applications. We can do so many other cool things such as MCQ generator. Let's say you are a teacher and you, you want to generate the questions on let's say World War II. And you want one hard, one medium and one easy question. You click on generate and you wait for the response of that LLM. You see immediately within five seconds, we have got question and answers for World War II. We have three questions. We have their correct answer and the explanation. We built this LLM from scratch or rather, we built this LLM application from scratch. And once you go through this playlist, we'll have several lectures towards the end which will show you how you can also build such type of applications.

[32:32]But here I want to illustrate the point that if you have knowledge about LLMs, the sky is the limit. You can build wonderful applications like this. And what many students do wrong is that they get so fascinated by these applications that they just download the code, they run the code. They make some changes and they write on their resume that they know about LLM. That's not the right way. The right way is to understand the foundations, make your basics clear, understand the nuts and bolts of LLM, which is the purpose of this playlist. So this is, uh, this is section six, which is the application of LLMs. And now we have completed the six sections which we have planned for today's lecture.

[33:38]At the end, I want to leave you with just one sentence and that is the sky is the limit when it comes to LLM applications. Please remember that the applications which I just showed you right now are just the tip of the iceberg. You can do so many things right now if you learn about LLMs. That's what's so exciting about the time which we live in currently. The sky is the limit for research applications, for industrial applications. But what we think is that students who really contribute and make an impact are the ones who know the details of how to write the transformer code. How key query value works, what exactly is positional encoding. Until you understand these concepts, it becomes difficult. Your knowledge will be superficial. And that's the whole purpose of this playlist. Please comment in the YouTube comment section if you like the style of this content description. If you like the this whiteboard approach, I'm basically trying to write everything here and give a visual flavor and visual intuition for how things are working. In subsequent sections, we'll definitely dive a bit deeper into coding. The next lesson, as I've mentioned here, is stages of building LLMs. We'll have couple of lectures about these basics and then we'll be diving into coding. So make sure your understanding, make sure you make notes and ask questions in the comments.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript