Font Size: a A A

Attriburte Words Clustering And Expansion In Product Evaluation

Posted on:2016-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2298330467992106Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of electronic commerce, all kinds of product reviews appear on the Internet. In order to realize the automation intelligent analysis of mass goods reviews, product reviews analysis technology arises at the historic moment. As product attributes not only closely relate to the comment text classification, but also affect the emotional tendency analysis of the whole sentence through their collocation with emotional words, so the analysis of product attributes is a crucial problem. Product attributes consist of explicit attributes and implicit attributes. Explicit attributes usually appear in the sentences in the form of attributes words, which is the study object of this paper.Considering the synonymous relations of attribute words, based on unsupervised and a semi-supervised machine learning methods, this paper conduct a deep study on the clustering and extension of product attribute words. The research results and innovation points are as follows:First, the Affinity Propagation Clustering algorithm based on Word Representation (APCWR) is proposed. The key problems of the word clustering are the distance or similarity calculation, and the selection of clustering algorithm. APCWR cleverly uses the word vector to establish a model of word representation and uses the word2vec tool to train word representation. We apply the affinity propagation algorithm of text clustering to word clustering, design the preset parameters to adjust the class number of clusters, and design the damping coefficient to achieve data smoothing. Experimental results show that compared to dictionary based clustering and K-means clustering, APCWR has better clustering effect and algorithm performance. Second, the Attribute Words Expansion algorithm-based on Bootstrapping (AWE-Bootstrapping) is proposed. In this paper, first of all, the current commonly used synonyms expansion technologies are analyzed with their application in semantic dictionary, information retrieval and information extraction. Then based on the semi-supervised learning idea, an improved Bootstrapping algorithm is designed. AWE-Bootstrapping can achieve better extension effect than extraction method based on rules, only using a small amount of seed words.Third, the database of Sentiment Summarization on the Product Aspects (SSPA) is designed and implemented, and the idea of data processing and query expansion is applied to actual projects. In order to be used for attribute words clustering and extension, product reviews on the Internet need to be processed by data acquisition, data preprocessing, attribute words extraction and tagging, in which tagging is for testing the effect of clustering. Extended attributes are used to generate an attribute thesaurus through classification and filtering. In this paper, the author apply the idea of data processing to Task and Data Designing of Sentiment Sentence Analysis Evaluation in COAE2014, and apply the idea of similarity calculation and query expansion to the CCR task of KBA evaluation, both achieve good results.Fourth, the application of synonymous attribute words clustering and expansion is analyzed. The text classification and emotional words-attribute words match score is discussed in the application of product reviews analysis; and then the important role of synonymous relations in semantic dictionaries is emphasized through the introduction of the structure of common semantic dictionary, illustrating the important meaning of synonymous attribute words clustering and extension to build the "statistics dictionary ".
Keywords/Search Tags:Product Attributes, Words Clustering, Synonym Expansion, APCWR, AWE-Bootstrapping, Data Building
PDF Full Text Request
Related items