The global Minmax k-means algorithm

The global k-means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable initial positions, and employs k-means to minimize the sum of the intra-cluster variances. However the global k-means algorithm sometimes results singleton clusters and the initial positions sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k-means algorithm. In this paper, we modified the global k-means algorithm to eliminate the singleton clusters at first, and then we apply MinMax k-means clustering error method to global k-means algorithm to overcome the effect of bad initialization, proposed the global Minmax k-means algorithm. The proposed clustering method is tested on some popular data sets and compared to the k-means algorithm, the global k-means algorithm and the MinMax k-means algorithm. The experiment results show our proposed algorithm outperforms other algorithms mentioned in the paper.

is to use the multi restarting k-means algorithm (Murty et al. 1999;Arthur and Vassilvitskii 2007;Banerjee and Ghosh 2004). A new version of this method is the MinMax k-means clustering algorithm (Tzortzis and Likas 2014), which starts from a randomly picked set of cluster centers and tries to minimize the maximum intra-cluster error. Its application (Eslamnezhad and Varjani 2014) shows that the algorithm is efficient in intrusion detection.
In this paper, a new version of modified global k-means algorithms is proposed in order to avoid the singleton clusters. In addition, the initial positions chosen by the global k-means algorithms sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k-means algorithm. Therefore we employ the MinMax k-means clustering error method instead of k-means clustering error in global k-means algorithm to tackle this problem, obtain a deterministic algorithm called the global Minmax k-means algorithm. We do loads of experiments on different data sets, the results show that our proposed algorithm is better than other algorithms which referred in the paper.
The rest of paper is organized as follows. We briefly describe the k-means, the global k-means and the MinMax k-means algorithms in "Preliminaries" section. In "The proposed algorithm" section we proposed our algorithms. Experimental evaluation is presented in "Experiment evaluation" section. Finally "Conclusions" section conclude our work.

k-Means algorithm
Given a data set X = {x 1 , x 2 , . . . , x N }, x n ∈ R d (n = 1, 2, . . . , N ). We aim to partition this data set into M disjoint clusters C 1 , C 2 , . . . , C M , such that a clustering criterion is optimized. Usually, the clustering criterion is the sum of the squared Euclidean distances between each data point x n and the cluster center m k that x n belongs to. This kind of criterion is called clustering error and depends on the cluster centers m 1 , m 2 , . . . , m k : where Generally, we call M k=1 I(x i ∈ C k )�x i − m k � 2 intra-cluster error(variance). Obviously, clustering error is the sum of intra-cluster error. Therefore, we use E sum instead of E(m 1 , m 2 , . . . , m M ) in briefly, i.e. E sum = E(m 1 , m 2 , . . . , m M ).
The k-means algorithm finds locally optimal solutions with respect to the clustering error. The main disadvantage of the method is its sensitivity to initial position of the cluster center. (1)

The global k-means algorithm
To deal with the initialization problem, the global k-means has been proposed, which is an incremental deterministic algorithm that employs k-means as a local search procedure. This algorithm obtains optimal or near-optimal solutions in terms of clustering error.
In order to solve a clustering problem with M clusters, Likas et al. (2003) provided the proceeds as follows. The algorithm starts with one cluster (k = 1) and find its optimal position which corresponds to the data set centroid. To solve the problem with two clusters (k = 2) they run k-means algorithm N (N is the size of the data set) times, each time starting with the following initial positions of the cluster centers: the first cluster center is always placed at the optimal position for the problem with k = 1, and the other at execution n is placed at the position of the data point x n (n = 1, 2, . . . , N ). The solution with the lowest cluster error is kept as the solution of the 2-clustering problem. In general, let (m * 1 , m * 2 , . . . , m * k ) denote the final solution for k-clustering problem. Once they find the solution for the (k − 1)-clustering problem, they try to find the solution of the k-clustering problem as follows: they perform N executions of the k-means algorithm with (m * 1 , m * 2 , . . . , m * (k−1) , x n ) as initial cluster centers for the nth run, and keep the solution resulting in the lowest clustering error. By proceeding in the above fashion they finally obtain a solution with M clusters and also found solutions for all k-clustering problems with k < M.
This version of the algorithm is not applicable for clustering on middle sized and large data sets. Two modifications were proposed to reduce the complexity (Likas et al. 2003), and we interest in the first procedure. Let d j k−1 is the squared distance between x j and the closest center among the k − 1 cluster centers obtained so far. In order to find the starting point for the kth cluster center, for each x n ∈ R d , n = 1, 2, . . . , N we compute b n as follows.
The quantity b n measures the reduction in the error measure obtained by inserting a new cluster center at point x n . It is clear that a data point x n ∈ R d with the largest value of the b n is the best candidate to be a starting point for the kth cluster center. Therefore, we compute i = arg max n b n and find the data point x n ∈ R d such that b n = i. This data point is selected as a starting point for the kth cluster center.

The MinMax k-means algorithm
As we known, in the k-means algorithm, we minimize the clustering error. Instead of this method, the MinMax k-means algorithm minimizes the maximum intra-cluster error where m k , I(x) are defined as (1).
Since directly minimizing the maximum intra-cluster variance E max is difficult, a relaxed maximum variance objective was proposed (Tzortzis and Likas 2014). They constructed a weighted formulation E w of the sum of the intra-cluster variances (4) where the p exponent is a constant. The greater(smaller) the p value is, the less(more) similar the weight values become, as relative differences of the variances among the clusters are enhanced(suppressed). Now, all clusters contribute to the objective, according to different degrees regulated by the w k values. It is clear that the more a cluster contributes (higher weight), the more intensely its variance will be minimized. So w k are calculated by formula (5) To enhance the stability of the MinMax k-means algorithm, a memory effect could be added to the weights:

The proposed algorithm
The modified global k-means algorithm As we known, the global k-means algorithm may obtain singleton clusters if the initial centers are outliers. To avoid this, we propose the Modified global k-means algorithm.
Algorithm 1: The Modified global k-means Algorithm 1.
Step 1 (Initialization) Compute the centroid m 1 of the data set X: and k = 1; Step 2 (Stopping criterion) Set k = k + 1. If k > M, then stop; Step 3 Take the centers m 1 , m 2 , . . . , m k−1 from the previous iteration and consider each point x i of X as a starting point for the kth cluster center, thus obtain N initial solutions with k points (m 1 , m 2 , . . . , m k−1 , x i ); Step 4 Apply the k-means algorithm to each of them; keep the best k-partition obtained and its centers y 1 , y 2 , . . . , y k ; Step 5 (Detect the singleton clusters) If the obtained clusters exist singleton cluster, then delete the point y k in candidate initial center X, and go to step 3, else go to step 6; Step 6 Set m i = y i , i = 1, 2, . . . , k and go to step2. (4) Due to high computational cost of the global k-means algorithm, we propose the fast algorithm. It is based on the idea as the fast global k-means variant proposed in Peña et al. (1999).
Algorithm 2: The Modified global k-means Algorithm 2. The steps 1, 2, 6 are same to the Algorithm 1.
Steps 3, 4, 5 is modified as follows: Step 3′ Take the centers m 1 , m 2 , . . . , m k−1 from the previous iteration and consider each point x i of X as a starting point for the kth cluster center, then calculate b i using Eq.
(2), choose the corresponding starting point of maximum b i as the best solution; Step 4′ Apply the k-means algorithm to the best solution; keep the best k-partition obtained and its centers y 1 , y 2 , . . . , y k ; Step 5′ (Detect the singleton clusters) If the obtained clusters exist singleton cluster b i , then let b i = 0, and go to step 3, else go to step 6; In our numerical experiments we use Algorithm 2. Our proposed algorithm based on realistic data set. The data set includes 41 students scores, and each student has 11 subjects grades. When we use the global k-means algorithm to cluster students according to their scores of subjects, the output is bad. The comparisons between the global k-means algorithm and the modified global k-means algorithm in Table 1. Table 1 shows when we partition the data for four clusters, there are two clusters just include one element in the global k-means algorithm, i.e. there are two singleton clusters in the global k-means algorithm. We also find that the E sum of modified global k-means is more lower than that of global k-means.

The global Minmax k-means algorithm
The global k-means algorithm is a deterministic global search procedure from suitable initial positions, but the initial positions sometimes are poor. An example is illustrated in Fig. 1. The MinMax k-means algorithm was verified effective and robust over bad initializations (Murty et al. 1999), but its not deterministic, it needs multiple restarts. So we combine the global k-means algorithm and the MinMax k-means algorithm, i.e. we apply MinMax k-means clustering error method to the global k-means algorithm, then we get a deterministic algorithm called the global Minmax k-means algorithm.
The global Minmax k-means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable positions like the global k-means algorithm, and this procedure was introduced in preliminaries. After choose the initial center, we employ the Min-Max k-means method to minimize the maximum intra-cluster variances. The MinMax k-means algorithm was described in preliminaries. The whole method of the proposed algorithm is illustrated as Algorithm 3. Algorithm 3: The global Minmax k-means algorithm.
Step 1 (Initialization) Compute the centroid m 1 of the set X, using (7).
Step 2 (Stopping criterion) Set k = k + 1. If k > M, then stop; Step 3 Take the centers m 1 , m 2 , . . . , m k−1 from the previous iteration and consider each point x i of X as a starting point for the kth cluster center, thus obtaining N initial solutions with k points (m 1 , m 2 , . . . , m k−1 , x i ); Step 4 Apply the MinMax k-means algorithm to each of them; keep the best k-partition obtained and its centers y 1 , y 2 , . . . , y k ; Step 5 (Detect the singleton clusters) If the obtained clusters exist singleton cluster, then the candidate initial center delete the point y k , and go to step 3, else go to step 6; Step 6 Set m i = y i , i = 1, 2, . . . , k and go to step 2.

Experiment evaluation
In the following subsections we provide extensive experimental results comparing the global Minmax k-means algorithm with k-means algorithm, the global k-means algorithm and the Minmax k-means algorithm. In the experiments, the results of k-means algorithm and the MinMax k-means algorithm are the average of E max E sum defined by (3) (1) , which restart 100 times. For the MinMax k-means algorithm and the global Minmax k-means algorithm, some additional parameters (β, p) must be fixed prior to execution. In Tzortzis and Likas (2014), there gives a practical framework that extends the MinMax k-means to automatically adapt the exponent p to the data set. It begins with a small p (p init ) that after each iteration is increased by p step , until a maximum value p (p max ) is attained. As the method, we should decide parameter p init , p max and p step at first. We set p init = 0, p step = 0.01 and using p instead of p max for all MinMax k-means and global Minmax k-means algorithm experiments. In Tables 2, 3 and 8, we did not mark the value of parameter p, since for different p has the same result.

Synthetic data sets
Four typical synthetic data sets S 1 , S 2 , S 3 , S 4 are tested in this section, as in Fang et al. (2013). Typically, they are generated from a mixture of four or three bivariate Gaussian distribution on the plane coordinate system. Thus a cluster takes the form of a Gaussian  −1), respectively, and their standard variances σ keep the same, but vary with the data sets. Actually, σ takes the values of 0.2, 0.3, 0.4 for S 1 , S 2 , S 3 , respectively. In this way, the degree of overlap among the clusters increases considerably from S 1 to S 3 and therefore the corresponding classification problem becomes more complicated. As for S 4 , we give three Gaussian distributions located at (1, 0), (0, 1) and (0, −1), with 400, 300, 200 sample points, respectively. Therefore, S 4 represents the asymmetric situation where the clusters do not take the same shape, and also with different number of sample points. The data sets are shown in Fig. 2 respectively.

Real-world data sets
Coil-20 is a data set (Nene et al. 1996), which contains 72 images taken from different angels for each of the 20 included objects. We used three subsets Coil15, Coil8, Coil19, with images from 15, 18 and 19 objects, respectively, as the data set in Tzortzis and Likas (2014). The data set includes 216 instances and each of the data has 1000 features.
Iris(UCI) (Frank and Asuncion 2010) is a famous data set which created by R.A. Fisher. There are 150 instances and 50 in each of three classes. Each data has four predictive attributes.
Seeds(UCI) (Frank and Asuncion 2010) is composed of 210 records that extract from three different varieties of wheat. The number of each grain is equal and each grain is described by seven features.  Yeast(UCI) (Frank and Asuncion 2010) includes 1484 instances about the cellular localization sites of proteins and eight attributes. Proteins belong to ten categories. Five of the classes are extremely under represented and are not considered in our evaluation. The data set is unbalanced.
User Knowledge Modeling (UCI) (Frank and Asuncion 2010) is about the students' knowledge status about the subject of Electrical DC Machines. User Knowledge Modeling includes 403 instances with 6-dimensional space. The data set is unbalanced. The students are assessed four levels.
In the experiment, the sample data of Iris, Seeds and Pendigits data set will be normalized using z-score method firstly and the algorithm will be implemented on the normalized data.
A summary of the data sets is provided in Table 4.

Performance analysis
The comparison of the algorithms across the various data sets is shown in Tables 2 , 3 , 4, 5, 6, 7, 8, 9, 10, 11 and 12, except Table 6. In Tables 2, 3 , 4, 5, 6, 7, 8, 9, 10, 11 and         12, first, we find that the global Minmax k-means algorithm attains better E max than k-means algorithm and global algorithm, and in most of cases it better than the Min-Max k-means algorithm, sometimes equal to the MinMax k-means algorithm. Second, the proposed method outperforms k-means algorithm for all the metrics reported in Tables 2, 3 , 4, 5, 6, 7, 8, 9, 10, 11 and 12 except in Table 3, which get the same result for all algorithms. Third, the global Minmax k-means algorithm can reach the lowest E sum , except in Tables 7 and 10. As our method employs both the global k-means and the MinMax k-means algorithm, it perform better than each of the algorithm or sometimes attain the same effect. In Tables 4, 5, 11 and 12, our proposed method attain both the lowest E max and the E sum . In Table 11, although global k-means reach the lowest E sum too, but when it attain the point, its E sum is bigger than ours. In Tables 4  and 5, the MinMax k-means algorithm also can reach the lowest E max , but it can not attain the lowest E sum . In Tables 7 and 10, the proposed method can not result the lowest E sum , but just the method can attain the lowest E max . In Tables 2 and 9, all algorithms except k-means make the equal effect. In Table 8, MinMax k-means and global Minmax k-means algorithm run in the same result. They are better than k-means and global k-means. In the experiment, we find the memory parameter β and exponent parameter p affect the results in the MinMax k-means and the global Minmax k-means algorithm, and the variation does not have any rule. The practical framework that extends the MinMax k-means to automatically adapt the exponent to the data set proposed in Tzortzis and Likas (2014). They thought if the p max has been set, the programme can reach the lowest E max at p ∈ [p init , p max ]. However, our experiments show that it is not always correct. In Tables 10 and 11, when we set p max = 0.3, the results is better than p max = 0.5. In the experiment, it is easy to show that E max and E sum can not attain the lowest value at a time.  global k-means and the MinMax k-means algorithm i.e. we get a deterministic clustering method and need not any restart and our proposed algorithm always performs well. As for future work, we plan to study in adapt method to determine the exponent parameter p and the memory parameter β, such that E max or E sum attain the lowest. And it would be better for us to tackling the two parameters at one time.