Font Size: a A A

Research On An Improment Method Of SOM Clustering Algorithm And Its Application In Web Text Mining

Posted on:2012-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:L H CaiFull Text:PDF
GTID:2178330338996444Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Intelligence is the soul of national defense, It is associate with a country's safety and progress. The rapid development of the Internet provides the most timely and most important source for collecting intelligence of national defense. However, due to intelligence information on the Internet exist mostly in the form of semi-structured or unstructured form of free text, and the number too large to describe, this easily makes people in the lost of "data Ocean", "information fog". Therefore it is necessary to implement a text mining system to help people classify and clustering information automatically, and then extract the valid information quickly.Text clustering is the most basic and the most important function of text mining.so in the realization of the text mining system, the key question is how to text clustering and how to improve the efficiency of clustering. This thesis first analyzes the research background and research status of text clustering, and then introduces the basic theories related to the text clustering, including the theory of text mining and key technologies of text clustering, highlightings work principle and basic process of SOM neural network, analyzes the advantages and disadvantages of SOM.For the disadvantages of the SOM clustering algorithm, this thesis proposed corresponding improvement program in two ways. On the one hand, the traditional input vector based on vector space model has the problem of sparse and high latitudes and the lack of semantic support. This thesis proposed expressing text as theme concept vectors based on domain ontology; the other hand, traditional SOM clustering uses full distortion method to search the nearest node as winning neuron. This excessive multiplication calculation leads to clustering time too long. So this thesis proposed partial distortion method to reject impossible candidate winning neuron early, to avoid unnecessary calculations, reducing multiplication calculation, enhancing the clustering speed. Finally, the improved SOM algorithm is applied to text mining in national defense in order to verify the validity and the superiority relative to the original SOM algorithm.
Keywords/Search Tags:Web text mining, text clustering, SOM clustering algorithm, ontology
PDF Full Text Request
Related items