Font Size: a A A

Research On Chinese Text Clustering Of Neural Network Of Support Vector Machine

Posted on:2010-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:J L MaFull Text:PDF
GTID:2178360278469583Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, electronic texts have become more and more popular as source of information. People need some tools to find resource and knowledge urgently. The processing upon texts offers much potential for development in such fields as information retrieval, especially in text clustering. But the research of Chinese text clustering is at its early stage, and there are still many problems to be resolved.In this paper, we firstly make an introduction about the research background and state of text clustering, after which the conception of data mining, the main algorithm of clustering and the theory of support vector machine are analyzed.Secondly, we analyzed several pivotal topics and technology on Chinese text clustering for the characteristic of Chinese, included text separated, express of Chinese text's feature (Vector Space Model, VSM) and the technique of dimension reduced. We as well as proposed a broad concept about reduce of the dimension.Thirdly, we proposed the algorithm of support vector machine& self-organizing feature map for Chinese text clustering (SVM-SOM), combining self-organizing feature map (SOM) and support vector machine (SVM). And then the principium, convergence and step of the SVM-SOM are elaborated.Finally, the SOM and SVM-SOM algorithm are implemented. Moreover, the testing data sets in the real life are used to test the clustering effect, after which we compare these two algorithms by computing the F-value, precision and recall value which is widely used in the information retrieval field. At last the robust of them are tested by adding noise data. The results show that our method can improve the effect of clustering and more robust.
Keywords/Search Tags:clustering, Chinese text clustering, self-organizing feature map, support vector machine
PDF Full Text Request
Related items