

Let's color them as blue and yellow for clear visualization. Consider the below image:įrom the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and points to the right of the line are close to the yellow centroid. So, we will draw a median between both the centroids. We will compute it by applying some mathematics that we have studied to calculate the distance between two points. Now we will assign each data point of the scatter plot to its closest K-point or centroid.So, here we are selecting the below two points as k points, which are not the part of our dataset. These points can be either the points from the dataset or any other point. We need to choose some random k points or centroid to form the cluster.It means here we will try to group these datasets into two different clusters. Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different clusters.The x-y axis scatter plot of these two variables is given below: Let's understand the above steps by considering the visual plots: Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster. Step-4: Calculate the variance and place a new centroid of each cluster. Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters. (It can be other from the input dataset). Step-2: Select random K points or centroids. Step-1: Select the number K to decide the number of clusters. The working of the K-Means algorithm is explained in the below steps:

The below diagram explains the working of the K-means Clustering Algorithm: How does the K-Means Algorithm Work? Hence each cluster has datapoints with some commonalities, and it is away from other clusters. Those data points which are near to the particular k-center, create a cluster. Assigns each data point to its closest k-center.Determines the best value for K center points or centroids by an iterative process.The k-means clustering algorithm mainly performs two tasks: The value of k should be predetermined in this algorithm. The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats the process until it does not find the best clusters. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters. It is a centroid-based algorithm, where each cluster is associated with a centroid. It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the unlabeled dataset on its own without the need for any training. It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. In this topic, we will learn what is K-means clustering algorithm, how the algorithm works, along with the Python implementation of k-means clustering.

K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in machine learning or data science. Next → ← prev K-Means Clustering Algorithm
