[0:12]Data in the database is stored in terms of enormous quantity. Retrieving certain data will be a tedious task if the data is not organized correctly. With the help of normalization, we can organize this data and also reduce the redundant data. Hey guys, this is Pratik from Edureka and I welcome you all to this interesting session on normalization in SQL. In this session, I'll explain everything that is related to normalization with simple examples that are easy to remember. Firstly, let's look at the agenda for today's session. So we're going to start off with understanding normalization, and moving further, we shall look at various types of normalization. And those are first normal form, second normal form, third normal form, and Boyce-Codd normal form. I hope you guys are clear with the agenda. But before moving further, if you haven't subscribed to our channel then do subscribe to never miss out on an update. With that being said, let's get started. The first topic in today's session is, what is normalization? Database normalization is a technique of organizing the data in the database. It is a systematic approach of decomposing tables to eliminate data redundancy. It is a multi-step process that puts data into tabular form, removing the duplicated data from its relational tables. On the screen, we just saw that the table is getting decomposed into two smaller table. Is it really necessary to normalize the table that is present on the database? Well, every table in the database has to be in the normal form. So normalization is used mainly for two purpose. So the first one is, it is used to eliminate repeated data. Having repeated data in the system, not only makes the process slow, but will cause trouble during the later part of transactions. And the second one is, to ensure the data dependencies make some logical sense. Yes, usually the data is stored in database with certain logic. Huge data sets without any purpose are completely waste. It's like having an abundant resource without any application. The data that we have should make some logical sense. Normalization came into existence because of the problems that occurred on data. Now, let's look at those problems. And these are known as data anomalies. If a table is not properly normalized and has data redundancy, then it will not only eat up the extra memory space but will also make it difficult to handle and update the database. Let's look at the first anomaly that is insertion anomaly. Suppose for a new position in a company, Mr. Rakshit is selected, but the department has not been allotted for him. In that case, if we want to update his information to the database, we need to set the department information as null. Similarly, if we have to insert data of 1000 employees who are in similar situation, then the department information will be repeated for all those 1000 employees. This scenario is a classical example of insertion anomalies. The next one is update anomaly. What if Mr. Rakshit leaves the company or is no longer the head of the marketing department? In that case, all the employee records will have to be updated. And if by mistake we miss any record, it will lead to data inconsistency. This is nothing but updation anomaly. And the final one is deletion anomaly. In our employee table, two different pieces of information are kept together, that is, employee information and department information. Hence, at the end of financial year, if employee records are deleted, we will also lose the department information. This is nothing but deletion anomaly. So these were some of the problems that occurred while managing the data. To eliminate all these anomalies, normalization came into existence. There are many normal forms which are still under development, but let's focus on the very basic and the essential ones only. So, we will be talking about first normal form, second normal form, third normal form, and finally end the session with Boyce-Codd normal form. So without wasting further time, let's proceed to first normal form. In first normal form, we tackle the problem of atomicity. Here, atomicity means values in the table should not be further divided. In simple terms, a single cell cannot hold multiple values. If a table contains a composite or multi-valued attributes, it violates the first normal form. So the following functions will be performed in first normal form. The first one is, it removes repeating groups from the table. And next, it creates a separate table for each set of related data. And finally, it identifies each set of related data with a primary key. To understand this in a better way, let's look at the given table. In the employee table, we have employee ID, employee name, phone number, and salary as columns. We can clearly see that the phone number column has two values. Thus, it violates the first normal form. Now, if we apply the first normal form to the above table, we get the following result. In this table, each and every row is distinct, that is, no cell has multiple values. The table has achieved atomicity. First normal form is simple and can be easily identified. In the table, we can clearly see there is no multiple values in each and every column. Thus, the first normal form is achieved. Now, let's move to second normal form. Second normal form was originally defined by E. F. Codd in 1971. A table is said to be in second normal form only when it fulfills the following condition. The first condition is, it has to be in first normal form. And the second one is, the table also should not contain partial dependency. Here, partial dependency means the proper subset of a candidate key determines a non-prime attribute. So what is a non-prime attribute? Let's understand this in a simple way. Attributes that form a candidate key in a table are called prime attributes. And the rest of the attributes of the relation are non-prime. For a table, prime attributes can be like employee ID and department ID. And the non-prime attributes can be like office location. To understand second normal form, let's consider this table. This table has a composite primary key, that is, employee ID and department ID makes the primary key. The non-key attribute is office location. In this case, office location only depends on department ID, which is only the part of primary key. Therefore, this table does not satisfy the second normal form. So what to do in such scenario? The answer is simple, split the table accordingly. To bring this table to second normal form, we need to break the table into two parts, which will give the following tables. The first table has employee ID and department ID as columns. The second one has department ID and office location as columns. As you can see, we have removed the partial functional dependency that we initially had. Now, in the table, the column office location is fully dependent on the primary key of that table, which is nothing but department ID. I hope you have understood second normal form. Now that we have learned first normal form and second normal form, let's head to the next part of this normalization. Next topic is third normal form. Third normal form is a normal form that is used in normalizing the table to reduce the duplication of data and ensure referential integrity. The following condition has to be met by the table to be in third normal form. And the first condition is, the table has to be in second normal form. And the second condition is, no non-prime attribute is transitively dependent on any non-prime attribute which depends on other non-prime attributes. I know it's a bit confusing, so let me make it simple for you. It's like if C is dependent on B and in turn B is dependent on A, then transitively C is dependent on A. This should not happen in third normal form. All the non-prime attributes must depend only on the prime attributes. So these are the two necessary condition that needs to be attained. So why was third normal form designed? Firstly, to eliminate undesirable data anomalies. Next one is to reduce the need for restructuring over time. And finally, to make the data model more informative. Since we have understood the third normal form, let's look at the example table. In the above table, student ID determines subject ID and subject ID determines subject. Therefore, student ID determines subject via subject ID. This implies that we have transitive functional dependency. And this table does not satisfy the third normal form. Now, in order to achieve third normal form, we need to divide the table as shown below. Firstly, let's divide the table and store student ID, student name, subject ID, and address in it. All the columns are referring to the primary key, which is student ID. Let the second table have subject ID and subject column. So, subject is dependent only on subject ID and not on student ID. As you can see from the above table, all the non-key attributes are now fully functionally dependent only on the primary key. In the first table, columns such as student name, subject ID, and address are only dependent on student ID. In the second table, subject is only dependent on subject ID. With this being understood, now we can proceed further to next normal form, that is Boyce-Codd normal form. This is also known as 3.5 normal form. It is the higher version of third normal form and was developed by Raymond F. Boyce and Edgar F. Codd to address certain types of anomalies which were not dealt with third normal form. Before proceeding to Boyce-Codd normal form, the table has to satisfy third normal form. In Boyce-Codd normal form, if every functional dependency, that is A implies B, then A has to be the super key of that particular table. So what is a super key? A super key is a group of single or multiple keys which identifies rows in a table. Let's look at the table to clearly understand Boyce-Codd normal form. In the given table, one student can enroll for multiple subjects. There can be multiple professor teaching one subject and for each subject a professor is assigned to the student. These are the necessary condition of this table. In this table, all the normal forms are satisfied except Boyce-Codd normal form. Why? As you can see that, student ID and subject form the primary key, which means that the subject column is prime attribute. But there is one more dependency, that is professor is depending on subject. And while subject is the prime attribute, professor is a non-prime attribute which is not allowed by Boyce-Codd normal form. Now, in order to satisfy the Boyce-Codd normal form, we will be dividing the table into two parts. The table at the top will hold student ID which already exist and we will create a new column that is professor ID. And the second table which is below will have the columns professor ID, professor and subject columns. Why do we need to have a new column, that is Professor ID? By doing this, we are removing the non-prime attributes functional dependency. In the second table, professor ID will be the super key of that table and remaining column will be functionally dependent on it. By doing this, we are satisfying Boyce-Codd normal form. So this brings us to the end of this session. I hope you have clearly understood the normalization and its different types. If you have any queries or doubts regarding this session, please let me know in the comment section and I'll get back to you with an answer. Thank you for watching this video and have a great day.

What is Normalization in SQL? | Database Normalization Forms - 1NF, 2NF, 3NF, BCNF | Edureka
edureka!
12m 16s1,998 words~10 min read
YouTube auto captions
Transcript source
YouTube auto captions
This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.
Pull quotes
[0:12]Retrieving certain data will be a tedious task if the data is not organized correctly.
[0:12]With the help of normalization, we can organize this data and also reduce the redundant data.
[0:12]Hey guys, this is Pratik from Edureka and I welcome you all to this interesting session on normalization in SQL.
[0:12]In this session, I'll explain everything that is related to normalization with simple examples that are easy to remember.
Use this transcript
Related transcript hubs
Watch on YouTube
Share
MORE TRANSCRIPTS


