Font Size: a A A

Exploiting Multiple Semantic Features For Comment Text Clustering

Posted on:2014-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:2268330401462536Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology, we have entered the era of all the people involved in the creation of information on the Internet. The comment text becomes the main bearer of Internet information. People want to dig out the useful information from the massive comment text quickly and efficiently. Text clustering technique does not require any prior knowledge, and there are a lot of sophisticated algorithms, so often become a priority.Features are the key to all opinion mining and sentiment analysis task. In this paper, from the perspective of Features constitute, k-means method is used to study text topic clustering and tendency clustering.(1) Topic clusteringFeatures play an important role for text topic clustering; Starting from the features of semantic granularity, this paper discusses the effect the three kinds of semantic features--nouns and noun phrases features, semantic role features and taking into account the features redundant processing and weight adjustment processing. Experiments prove that redundant processing strategy make topic clustering purity to improve the range of0.01-0.25; Weight adjustment method make the theme clustering purity to improve the range of0.011-0.015; Redundant processing strategy and method of weight when used at the same time, make the topic clustering purity to improve the range of0.015-0.015.To further explore the semantic relations between the noun features and the semantic role features, this paper puts forward a feature selection method which by decomposition of semantic role features to directly position effective word feature. Clustering In the complex data sets, the purity is0.8099. The method is easy to understand and effective, and provides a new way of thinking for feature selection method of text topic clustering.(2) Tendency clusteringFor tendency analysis tasks, to identify the propensity of the features is crucial. Therefore, in the tendentious clustering task, as much as possible to identify the tendentious words as the clustering features, this paper proposes a method to automatic identify and mark tendency features. This method uses tendentious word table and synonyms to mark the tendency of features as many as possible, and using different weight adjustment strategy for the tendency and no tendency feature, improve the propensity characterization capabilities of features. Experimental results show that the method is used for orientation clustering to improve clustering purity compared with the traditional clustering method by0.0887. But compared with the traditional two methods of automatic bias analysis, accuracy is far from achieving a request. So using clustering method for text tendency analysis need to seek new breakthroughs from the features or clustering method.
Keywords/Search Tags:Features, Text clustering, Topic clustering, Tendency clustering
PDF Full Text Request
Related items