[0:06]Welcome to machine learning tutorial. In this video, we will understand K means algorithm for clustering. K means algorithm is an unsupervised learning algorithm. Given a data set of items with certain features and values for these features, the algorithm will categorize the items into K groups or clusters of similarity. So, when we have been given a set of data points, they are categorized into different clusters based on the similarity between those data points. To calculate the similarity, we can use the Euclidean distance, Manhattan distance, Hamming distance, cosine distance as a measurement. Using these distance measurements, we can calculate the similarity between the data points and we can categorize them into different clusters. Here is the pseudo code for implementing a K-means algorithm. Input the K-means algorithm is the number of clusters and the data points, that is D. The K-means algorithm has two steps. In the first step, we need to choose K number of random data points as the initial centroids, cluster centers. The second step is repeated till the cluster centers stabilize. In the second step, we have two sub steps. The first step is allocate each of the data points to the nearest Kth centroid, that is, first we need to calculate the distance between the centroids to all the data points. We need to find the nearest Kth centroid for every data point, and then we need to assign the data point to the nearest centroid. Once we do that one, we have to calculate the new centroid for the clusters using all the points in those clusters. Once you find the new centroid, we will go back and then we will calculate again the distance between the data points and the new centroids. Depending on the distance between the data points and the centroids, we will allocate the points to the nearest Kth centroid. Again, we will calculate the centroid for the new cluster points. The same step is repeated again and again until the cluster centers stabilizes. This is the flow diagram of K-means algorithm for clustering. We will start the algorithm, and then we need to choose the number of clusters, that is K. We need to select the random centroids initially. We need to calculate the distance between the data points to the centroids. Depending on this distance, we need to assign the data points to one of the centroids, that is, grouping is based on the minimum distance. We need to go to this step and then we need to check one condition here, whether the data points are moving from one cluster to other cluster. If they are moving, the meaning is they have not yet stabilized. We need to go back and then we need to find out the new centroid. Once you find the new centroid, we need to calculate the distance between the data points to the new centroid. Again, we need to group the data points based on the minimum distance. We need to check whether the data points are moving from one cluster to other cluster. If they are moving, again the same steps have to be repeated. If they are not moving, the meaning is clusters have got stabilized. We need to stop the algorithm. Now we will try to understand what are the advantages and disadvantages of K-Means algorithm. The advantages of K-Means algorithm are K-means algorithm is simple, easy to understand, and easy to implement. It is also efficient in which the time taken to cluster K-means rises linearly with the number of data points. No other clustering algorithm performs better than K-means, so K-means is the most preferred and widely used clustering algorithm in unsupervised learning learning. Disadvantages of K-Means algorithm are: the user needs to specify an initial value of K, that is, how many number of clusters are required? That is very difficult question to answer here because it is very difficult to know how many number of clusters we need to form at the initial stage. For that reason, we can draw the data points and then we can visualize how the data points looks like, and then we can specify the value of K, but still that needs art in that case. The second disadvantage is the process of finding the clusters may not converge. This may happen because of the there is no similarity between the data points. In such case, it is not possible for the data points to converge into different clusters. Finally, it is not suitable for discovering the clusters that are not hyper ellipsoids or hyper spheres. So, in this video, I have introduced what is K-Means algorithm, what are the steps in K-Means algorithm? What is the flow diagram? And what are the advantages and disadvantages of K-Means algorithm? I hope you understood the concept. If you like the video, do like and share with your friends. Press the subscribe button for more videos. Press the bell icon for regular updates. Thank you for watching.

KMeans Clustering Algorithm | Steps in KMeans Algorithm | Advantages Disadvantages by Mahesh Huddar
Mahesh Huddar
5m 54s838 words~5 min read
YouTube auto captions
Transcript source
YouTube auto captions
This transcript was extracted from YouTube's auto-generated caption track. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.
Pull quotes
[0:06]Given a data set of items with certain features and values for these features, the algorithm will categorize the items into K groups or clusters of similarity.
[0:06]So, when we have been given a set of data points, they are categorized into different clusters based on the similarity between those data points.
[0:06]To calculate the similarity, we can use the Euclidean distance, Manhattan distance, Hamming distance, cosine distance as a measurement.
[0:06]Using these distance measurements, we can calculate the similarity between the data points and we can categorize them into different clusters.
Use this transcript
Related transcript hubs
Watch on YouTube
Share
MORE TRANSCRIPTS


