Font Size: a A A

Research On Chinese Text Clustering Algorithm

Posted on:2017-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:D D WangFull Text:PDF
GTID:2308330488963031Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Clustering algorithm is an unsupervised learning algorithm, it has some flexibility and a higher ability to deal with problems automatically. Consequently it has a wide application. It can be combined with text mining technology to do clustering analysis on texts, which can be applied to the search engine area for users to search the valuable information they want conveniently and quickly. At the same time, it can also be used in spam filtering, document classification, etc.This paper mainly considered the Chinese text clustering algorithm. Firstly, the background and significance of the research were described. Then the concept of text mining was introduced, and the related technology of text mining was studied. It used R packages including Rwordseg and jiebaR to achieve the text word segmentation, and studied the technology of the feature extraction and dimension reduction, such as TF-IDF technology. In the third chapter, this paper summarized the process of text clustering and the common clustering algorithms. VSM text representation model and several clustering algorithms were mainly introduced. Finally, it used the K-means and hclust algorithm to cluster the data of users’ comments on the tourism industry and analyzed the effect of these algorithms. At the same time, the e-commerce industry review data was also employed for text clustering to have a comparison with the result of tourism industry comments for text clustering.
Keywords/Search Tags:test mining, test clustering, clustering algorithm
PDF Full Text Request
Related items