Font Size: a A A

Study On Two-stage Chinese Text Clustering Based On Self-organizing Of Map

Posted on:2006-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:H C ZhuFull Text:PDF
GTID:2178360155975233Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, tremendous volumes of text documents have become available on the Internet, digital libraries, news sources and company-wide intranets. Fast and high-quality document clustering algorithms play an important role in organizing large amounts of information into a small number of meaningful clusters as well as greatly improving the retrieval performance either via cluster-driven dimensionality reduction or term-weighting. Now text clustering is one of most important topics in data mining. The SOM(self-organizing map) is especially suitable for high-dimensional documents clustering. Text data is used as input data of SOM and mapped to a sequence two-dimension plane by training SOM. This ordered grid can be used for showing the relation between text data. The extensive range of based-SOM methods have been applied to English text clustering, relatively seldom have been done for Chinese text. But the methods only get visualization information. To get text clustering results, manual methods must be used. This paper study a two-stages clustering method for Chinese documents based-SOM. First, we present the background and current development of text clustering and discuss the content and target of our research. Second, we design a Chinese text clustering model CTCM and research main aspects of CTCM, including feature presentation, feature extraction, and clustering algorithm. Third, we lay emphasis on the study of text clustering algorithm. Based on the structure and training of the SOM, we give four text clustering algorithms based on the SOM:TCBSA(Text Clustering based on SOM and Agglomeration),TCBSD(Text Clustering based on SOM and density),TCBDSA(Text clustering based dynamic SOM and Agglomeration),and ITCBDSA(Incremental Text clustering based dynamic SOM and Agglomeration), and discuss the results of clustering experiments. The main results of our paper are as follows: present a Chinese text clustering model; obtain two Two-stage Clustering Algorithms for Chinese text based on static SOM, and two Two-stage Clustering Algorithms for Chinese text based on dynamic SOM.
Keywords/Search Tags:Chinese text, Text clustering, Self organizing of map(SOM), Vector Space Model(VSM)
PDF Full Text Request
Related items