Font Size: a A A

Research And Application On Technologies Of Text Clustering Oriented To Enterprise Competitive Intelligence

Posted on:2013-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:K TangFull Text:PDF
GTID:2248330395455663Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the data on the network expandsdrastically. These massive data contains great value, and most of them are in text form.In order to analysis the large-scale text and obtain useful information from it, textclustering as an important method of text mining has been studied in depth and developsrapidly. In which K-means as a classic clustering algorithm has linear time complexityand because of its easy implementation, it has wide application in the large-scale textprocessing. However, the clustering result of K-means algorithm is easily influenced byits initial centroids, and this will result in falling into local optimal solution and reducethe accuracy of clustering results.In this paper, we mainly study the selection of initial cluster centroids for thedrawback of K-means algorithm, a “neighbor” concept is proposed and Web documentis took as the clustering object, then we describe the whole process of text clustering indetail, including: the text pre-processing, clustering analysis and quality evaluation.Based on the thought of “neighbor”, we design a method to improve the initial centroidsselection. The main thought of the improved algorithm is to make a lower similaritybetween the documents which were selected as initial cluster center, and thesedocuments should have enough neighbors, therefore, we avoid the situation of fallinginto local optimal solution, the stability and accuracy of clustering results are alsoimproved.In this paper, we do experiments with a number of document sets, and the resultsshow effectiveness of the improved K-means algorithm. At the same time, based on thetheoretical research, we apply the improved K-means algorithm to the documentclustering system, which is a core module of competitive intelligence system, and thedocument clustering system achieves good performance in analysis tasks of competitiveintelligence.
Keywords/Search Tags:Competitive Intelligence, Text Clustering, K-means algorithm, Data Mining
PDF Full Text Request
Related items