[0:00]Welcome back. In this video, I will discuss how to use DB scan clustering algorithm to form the clusters on a given data set.
[0:09]This is the solved example number two. The link for other example is given in the description below.
[0:15]In this case, we have been given a data set with five points, also, a similarity matrix is given to us.
[0:21]Given this particular data set, we need to apply the DB scan algorithm with the similarity threshold of 0.8 and min points is greater than equivalent to two, that means the minimum number of points required in each cluster is two here.
[0:37]That is each cluster should contain two or more points in this case.
[0:41]Given this particular data set, we need to find the core, border and noise outliers in the set of points given in table.
[0:49]Finally, we need to create the clusters over here. So this is the data set given to us. Minimum points is equal to two, that means minimum number of points in each cluster should be two here and the similarity index is 0.8.
[1:00]That means the the minimum similarity index should be 0.8.
[1:04]If it is 0.8 or more, the meaning of that one is the two points are near to each other.
[1:11]Now, we will try to identify the nearest point or the similar point for P1 here.
[1:17]For P1, if you look at this particular similarity index values, for P1, P1 is the nearest one, of course that will be the case.
[1:26]But if you look at the remaining four points, all of them are having a index or the similarity value less than 0.8 here.
[1:34]The meaning of this one is P2, P3, P4 and P5 are not similar to P1 in this case.
[1:40]So that's the reason I have written dash here. Coming back to the next point, that is P2. If you look at this particular P2's row, again, P2 contains here one, the meaning is P2 is similar to P2, that is for sure.
[1:54]But if you look at the other values, this is the only value where the similarity index is more than 0.8 here.
[2:00]So that's the reason you can say that P2 is similar to P5 in this case.
[2:04]Now coming back to the next one, that is P3. If you look at this particular P3, apart from P3, we have one value, that is 0.85, which is more than this particular 0.8 for P5 here.
[2:17]The meaning of this one is for P3, P5 is the similar point in this case.
[2:22]So that is what I have written here. Coming back to the next one, that is P4.
[2:27]Apart from P4, that is 1.00 we have, which is greater than 0.8, that's for sure.
[2:32]We don't have any other similarity index, which is greater than 0.8 here.
[2:37]So that's the reason you can say that for P4, we don't have any similar points here, apart from P4 in this case.
[2:42]Coming back to the last point, that is P5. For P5, we have P2 similarity index is 0.98, which is greater than 0.8.
[2:52]P3's similarity index is 0.85, again it is greater than 0.8, and P5 of course it is 1.00, it is greater than or equal to 0.8 here.
[3:02]So that's the reason you can say that for P5, P5 is similar, that is for sure. Along with that, P2 and P3 are also similar points in this case.
[3:11]So that is what I have written here. So once you write these particular similar points in this case, now we need to identify which one of these particular point is a core point, border point and noise outlier over here.
[3:25]So for that reason, I have written this particular table. Now I will consider this particular P1.
[3:30]Now if you look at this particular P1, in this particular group, we have only one point that is P1. But how many points were expecting in each cluster? Minimum two here.
[3:40]So that's the reason this particular P1 is considered as a noise in this case, because it is less than two in this case.
[3:46]Coming back to the second one, that is P2, P5, we have two points here. And these two is equivalent to minimum number of points, so that's the reason you can say that P2 is a core point here.
[3:56]Coming back to the third one, that is P3, P5, P3, P5 contains two points again, because two points are there, we will consider this P3 as a core point here.
[4:06]For P4, we have only one point in this case, which is less than two, so that's the reason it is considered as noise.
[4:12]The last point, that is P5, contains three points over here including P5, because it is greater than two, it will be considered as core point here.
[4:22]Now, once you find the core and the noise points, these out of these particular two noise points, there is a possibility that it may be a border point also.
[4:31]That is, a particular point may be a border to two, what you can say that the clusters.
[4:38]So that can be identified something like this. I will consider again the first noise point P1 here.
[4:43]We will consider this particular P1 as a border point if this P1 is a part of any of the core points here.
[4:49]So how many core points are there? P2, P3, P5 are there. So if P1 is a part of any of these particular core point, then it will be considered as a border point here.
[4:59]But P1 is not a part of core point P2. It is not a part of core point P3. It is not a part of core point P5 here.
[5:07]So that's the reason it will be considered as noise only if it is a part of any of those particular core point, we would have considered it as a border here.
[5:16]Coming back to the second noise point, that is P4. Again, P4 is not a part of P2, P3, and P5, that is the core points.
[5:24]That's the reason this P4 again remains as a noise here, it will not become a border point here.
[5:30]Now, if you look at these particular things, P1 and P4 are noise. P2, P3 and P5 are the core points here.
[5:37]Also, we don't have any border points in this particular given data set.
[5:43]With respect to each of these particular core points, we will get one cluster. So the first cluster is P2 and P5. Second cluster is P3, P5, and the third cluster is P2, P3 and P5 in this particular case.
[5:57]So in this video, I have discussed how can we use DBSCAN clustering algorithm on a given data set to find the core, noise and border points, as well as how can we form these particular clusters over here.
[6:10]I hope the concept of DBSCAN clustering algorithm is clear. If you like the video, do like and share with your friends.
[6:16]Press the subscribe button for more videos. Press the bell icon for regular updates. Thank you for watching.



