Font Size: a A A

Research On Text Sentiment Clustering Method Based On Dimension Identification

Posted on:2016-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2308330482950605Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the widely use of micro-blog, twitter, and e-commerce platform, the Internet has brought convenience and advance to people. People like to record his moods, emotions, comments and opinions, which contain user’s sentiment orientation. Mining and analyzing this information can help assisting the state monitoring of public opinion, guiding enterprises to make decision, and influencing personal consuming behavior. In text sentiment analysis, the shortcoming of supervised machine learning methods is the large demand of labeled text dataset, while text clustering without supervision can overcome this problem. Aiming at text sentiment clustering, we carry out the research from two aspects:sentiment dimension identification and text semantic subspace, the main research contents and conclusions are as follows:(1) Analysis of the corpusIn order to explore the potential factors that influence the performance of text orientation clustering, this paper selects English product reviews and Chinese micro-blog as the experimental corpus. After analysis of the corpus and summary its linguistic characteristics, we come to the conclusion that high dimension and seriously sparse of text representation, implicit expression of sentiment are important factors affecting the text sentiment clustering.(2) A dimension identification method for text sentiment clusteringAiming at the problem of sentiment clustering, we propose a sentiment dimension identification method, which allows the text clustering along sentiment dimension. The method consists of two stages:compression for feature space and construction for sentimental dimension (DIMSC). We formalize and characterize the dimension, then extract opinion words form corpus by adopting the opinion word recognition technology. According to the number of opinion words in the dimension, the sentiment dimension can be identified automatically, and eventually realize the sentiment-based text clustering. Experiments on Chinese and English comments from different fields show that our method is efficient for sentiment dimension identification automatically, and is superior to the other clustering algorithm both in purity and F-measure.(3) A text similarity calculation method based on semantic subspaceAiming at the problem of high dimension and seriously sparse of text representation, and implicit expression of sentiment, we propose a computation method of text similarity based on semantic subspace (RESS), and explore the clustering results of sentiment clustering method based on the fusion of SSE and DIMSC. The experimental results show that, RESS can effectively reduce the feature of data set and get better results. SSE+DIMSC can solve the uncertainty problem of sentiment clustering. The purity and F-measure are significantly increased than SSE or DIMSC alone. Meanwhile, this method can be also applied to the unbalanced data sets.
Keywords/Search Tags:Text dimension identification, Text semantic subspace, Text similarity calculation, Opinion word identification, Sentiment-based text clustering
PDF Full Text Request
Related items