Font Size: a A A

Research Of Data Competition Algorithm Based On Aggregation Field Model

Posted on:2014-03-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:1268330425967049Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Cluster analysis, which stems from taxonomy, is a method of statistical analysis forexploring the internal structure of unknown data. Its importance and intersectionality amongother research direction are confirmed consistently by many researchers. Clustering is toorganize a dataset into meaningful or useful groups (clusters), and is an important researchcontent in some areas such as data mining and pattern recognition. Recently, it has beensuccessfully applied to image segmentation, text clustering, computer vision, speechrecognition, character recognition, data compression and information retrieval. In addition,cluster can also be applied to the disciplines such as the multi-relationship data mining,time-spatial database application, sequence and heterogeneous data analysis, bioinformaticsand marketing. The stability of the existing partitional clustering algorithms is seriouslyrestricted due to their sensitive to noise and outliers. Moreover, the clustering algorithmrequires higher clustering quality for the increasingly complicated internal structure of thedataset. This thesis studies the existing key problems of partitional clustering. And the mainwork is the following:(1)Clustering analysis is a process of exploring internal structure of the dataset in areasonable model framework. However, the several existing models can not well describepartitional clustering problems. First, a novel Aggregation Field Model is proposed and thefeatures of different data objects are defined in this thesis. According to them, severalstrategies of denoising and dealing with outliers are designed in this thesis. Next, an improvedK-Means algorithm based on aggregation energy, AEKMA, is designed in this thesis. It canprovide better initial centers for K-Means algorithm. The experimental results show thatAEKMA can well initialize the KM algorithm and its performance is better than that of theK-Means algorithm.(2)Under further studying the principle of aggregation field model, a novel datacompetition based partitional clustering algorithm, DCA, is designed in this thesis. DCAregards all data objects as potential representative points, finds the suitable representativepoints using which to complete the process of clustering. The experimental results show thatperformance of DCA is superior and can restrict the interference resulting from outliers, andthat the DCA is stable with obviously superior than some other partitional clustering algorithms and is an effective way to solve clustering problem.(3)The thesis find that the DCA can not obtain ideal results when it is directly applied todocument clustering after further studying the features of the DCA. The reason lies in thecomplicated and the high-dimensional sparse structure of text data set, and the existence ofdimension disaster phenomenon. Therefore, it is a novel way of solving text clustering tooptimize and improve the internal structure of the document dataset. Fortunately, spectralclustering ensemble algorithm can provide simple input for DCA because the essential of thealgorithm is to map high dimensional data to low dimensional one resulting in the obtainedsimple low dimensional embedding of original data. Then, a data competition based textclustering ensemble spectral algorithm, DCCESA, is designed in this thesis. The experimentalresults show that DCCESA can obtain better clustering results than those of the commonlyused clustering ensemble algorithms, and DCCESA is an effective method to solve theproblem of document clustering ensemble for its high clustering quality and efficiency.(4)This thesis further studies the probability of applying DCA to the field of imagesegmentation. As the time complexity of DCA is proportional to O(n2), it is unsuitable forlarge image processing. However, the experiments verify that the DCA can obtain bettersegmentation effect in small images though it can not partition large images under thecondition of certain hardware and software. In order to apply DCA to large image processing,an image segmentation using the Mean Shift algorithm and the DCCESA algorithm,MS-DCCESA, is designed in this thesis. MS-DCCESA pre-segments the large image by usingthe Mean Shift algorithm and introduces the thought of spectral clustering ensemble toperform the pre-segmented regions resulting in the good input of the DCA algorithm. Theexperimental results show that MS-DCCESA can obtain better segmentation quality than thatof some other commonly used algorithms and MS-DCCESA is effective.
Keywords/Search Tags:aggregation field model, K-Means algorithm, data competition, document clustering, image segmentation
PDF Full Text Request
Related items