Opinion Clustering Based On The Lda Model

Posted on:2013-12-03

Degree:Master

Type:Thesis

Country:China

Candidate:M X Zhang

Full Text:PDF

GTID:2248330374956704

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

In information retrieval, text clustering identifies the texts which have similar semantics. Accurate clustering results enable users to quickly understand the contents of the text in order to make a favorable judgment. Text clustering plays an indispensable role in the marketing, urban planning, and earthquake research. With the growing popularity of the network and online shopping, more and more people achieve subjective needs and express their views through the network. Therefore, opinion clustering becomes more and more important.This paper discusses the topics clustering and opinion clustering respectively. A feature selection algorithm is proposed for topics clustering. For opinion clustering, the algorithm discovers the implicit relationship between text and implicit classes and uses the relationship matrix to present the texts. And the field dependence is tested. The main content is as follows:(1) Topic clustering based on the LDA model feature selection.The algorithm selects features for topic clustering based on the implicit relationship between features and topics obtained by the LDA modeling. K-means algorithm is used to cluster. The experiments indicate that when we select2%of the whole features, purity and F-measure are increased15%and16%compared with the TC feature select algorithm,14%and13%compared with the clustering results of LDA, respectively.(2) Opinion clustering based on the LDA model text Representation.This paper models the texts by LDA and obtains the probability distribution of text and classes. And the probability distribution matrix is used as a vector space to present the texts. We also exploit the K-means algorithm to cluster and compare the method to the Boole model and the TF-IDF method. The experiments on the corpus of COAE2008show that on the best result, purity and F-measure are increased6%and7%compared with the Boole presentation,6%and9%compared with the TF-IDF presentation.(3) Field Dependency Examine of Opinion clustering method. Opinion clustering is a task of opinion mining. As the opinion mining usually dependence on the field knowledge strongly, this paper tests the field dependency of the proposed methods. The experiments show that the result on the data which contain much more mixed field is better than the data which just contain one field.

Keywords/Search Tags:

LDA model, Feature selection, Text representation, Opinionclustering

PDF Full Text Request

Related items

1	Application Of CTM Model Optimization Feature Selection In Text Categorization
2	Research On Improvement Of Chi-square Feature Selection And Word Vector Text Representation For News Classification
3	Research And Application Of Feature Selection And Text Representation In Text Clustering
4	Text Representation Model And Feature Selection Algorithm
5	The Research Of Text Representation And Feature Selection In Text Categorization
6	Text Representation And Algorithms For Chinese Text Classification
7	On Research For Chinese Automatic Text Categorization Technology Based On VSM Model And Feature Selection
8	Feature Selection And Feature Representation Text Classification Based On Convolutional Neural Networks
9	Research On Text Representation Model Based On LDA And Latent Feature Vector
10	A Research On Feature Extraction Applied For Text Classification