Font Size: a A A

Opinion Clustering Based On The Lda Model

Posted on:2013-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:M X ZhangFull Text:PDF
GTID:2248330374956704Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In information retrieval, text clustering identifies the texts which have similar semantics. Accurate clustering results enable users to quickly understand the contents of the text in order to make a favorable judgment. Text clustering plays an indispensable role in the marketing, urban planning, and earthquake research. With the growing popularity of the network and online shopping, more and more people achieve subjective needs and express their views through the network. Therefore, opinion clustering becomes more and more important.This paper discusses the topics clustering and opinion clustering respectively. A feature selection algorithm is proposed for topics clustering. For opinion clustering, the algorithm discovers the implicit relationship between text and implicit classes and uses the relationship matrix to present the texts. And the field dependence is tested. The main content is as follows:(1) Topic clustering based on the LDA model feature selection.The algorithm selects features for topic clustering based on the implicit relationship between features and topics obtained by the LDA modeling. K-means algorithm is used to cluster. The experiments indicate that when we select2%of the whole features, purity and F-measure are increased15%and16%compared with the TC feature select algorithm,14%and13%compared with the clustering results of LDA, respectively.(2) Opinion clustering based on the LDA model text Representation.This paper models the texts by LDA and obtains the probability distribution of text and classes. And the probability distribution matrix is used as a vector space to present the texts. We also exploit the K-means algorithm to cluster and compare the method to the Boole model and the TF-IDF method. The experiments on the corpus of COAE2008show that on the best result, purity and F-measure are increased6%and7%compared with the Boole presentation,6%and9%compared with the TF-IDF presentation.(3) Field Dependency Examine of Opinion clustering method. Opinion clustering is a task of opinion mining. As the opinion mining usually dependence on the field knowledge strongly, this paper tests the field dependency of the proposed methods. The experiments show that the result on the data which contain much more mixed field is better than the data which just contain one field.
Keywords/Search Tags:LDA model, Feature selection, Text representation, Opinionclustering
PDF Full Text Request
Related items