Font Size: a A A

A Partial Priority Clustering Algorithm For Large Datasets

Posted on:2012-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:D S HouFull Text:PDF
GTID:2178330332494600Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With development of information technology, the amount of data is more and more large. In order to extract relationship and connotative information from a large amount of data, data mining is being paid more and more attention. Clustering analysis is an important study field in data mining. As the development of clustering analysis, a number of classic clustering algorithms have been founded for recent years. This paper focuses on processing method for the large-scale job shops scheduling problems and a partial priority clustering algorithm is proposed. This algorithm is further applied to large data set. This paper is divided into two parts.In the first section, we detail the steps of the algorithm. In fact, for large scale scheduling problems, batch processing is a more reasonable and feasible strategy. Partial priority clustering algorithm considers the whole data set, aaccording to "emergency priority" and other scheduling rules, then define a priority cluster. The processing of the algorithm is as follows. Firstly, a priority typical point can be determined according to scheduling rules and data information. Secondly, priority class is constructed by using this typical point as seed. Then, those data items contained by this priority class are deleted timely to simplify the data set. Repeat this process until the remaining dataset is small enough to meet the conditions. Finally, the rest of the data items are assigned to the nearest cluster respectively. This algorithm has been successfully applied in the large scale job shops scheduling problems. To further verity the scalability of the algorithm, we applied the algorithm to several different data sets for analysis. Experiments show that this method can be applied to data sets of different sizes. The accuracy of the algorithm can reach to 90%.In the last section, cluster ensemble is applied in the partial priority clustering algorithm for further improve the accuracy. Experiments show that cluster ensemble is an effective method to overcome the deficiency of single clustering algorithm. However, the algorithm running time is longer than the single clustering algorithm. The results of experiments on several different sizes data sets show that cluster ensemble improves the accuracy and the stability of the algorithm.
Keywords/Search Tags:production scheduling, large dataset, partial priority clustering, typical sample, cluster analysis, cluster ensemble
PDF Full Text Request
Related items