Font Size: a A A

Research On Text Clustering Of Micro-blog Public Opinion: Word Sense Cluster And Collocation-Based Method

Posted on:2016-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:H J WangFull Text:PDF
GTID:2308330479498344Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Micro-blog, as a new internet information exchange platform which appears in recent years, has more characteristic, for example, theme dispersal, short and pithy and stylistic freedom. The information supervision department and commercial enterprise have urgent demand for public opinion analysis based on micro-blog information cause Micro-blog can have a huge impact on society. As the characteristic of Micro-blog, the common technique can not meet the needs of the micro-blog, and should make improvements and adjustments. Based on the rapid development of Micro-blog, public opinion analysis based on micro-blog information has already become one of hot research directions.The paper presents a novel method for text clustering based on word sense clustering and collocation on the basis of the Micro-blog’s characteristic, and the paper mainly includes:(1) Firstly the paper summarizes the technique about public opinion analysis, and presents some methods of collecting data, then elaborates classic system of word segmentation and relevant algorithm about segmentation. At last, the paper makes comparison and analysis for the advantages and defects of word segmentation algorithm. The paper uses the ICTCLAS for the word segmentation tools, which then introduces the advantages of this tool.(2) Text clustering is the key technique to this paper. As we all know, VSM can not express the words having sense relationship, for example, synonym and polysemant. This paper uses word sense clustering to describe text, and the word sense clustering technique adopts LDA theme model in the word sense clustering model, which solves the conversion from high dimensional vector to low dimensional vector. Using word collocation’s technique in texts can not only explicit research theme but also make the similarity of between texts calculate more convenient, so this paper presents automatic extraction collocation method based on the combination between the log likelihood ratio and entropy,which recognizes the word collocation. Experiment shows the validity of the method.(3) On the basis of common K-Means text clustering and the characteristic of Micro-blog, the paper presents a word sense clustering and collocation-based method for text clustering. Then make comparisons and analysis with the word sense clustering based on text clustering and traditional K-Means algorithm, which are presented by other researchers. The results show the validity of the method.Experiments improve that the efficiency of the text clustering method using word sense cluster is higher than traditional text clustering method by 6.3%, and the method of this paper has higher rate than the text clustering method using word sense cluster by 16.8%. the result shows the validity of the method.
Keywords/Search Tags:public opinion analysis, text clustering, word sense cluster, word collocation, K-Means
PDF Full Text Request
Related items