Font Size: a A A

The Study Of Feature Extraction And Clustering On Chinese Websites Product Reviews Based On The Improved Pruning Algorithm

Posted on:2018-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LingFull Text:PDF
GTID:2348330569986545Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the arrival of big data era,more and more consumer's comments on the network goods could be collected by e-commerce sites.Deep mining of online product reviews can not only help consumers to make reliable decisions,but also provides valuable feedback for online merchants to improve product quality.However,due to the large number of product reviews and the unstructured characteristics of the information content,it is difficult to dig out the information that consumers and enterprises care about most.How to make scientific decision-making and delicacy management for the massive user reviews has become a research focus in the field of opinion mining.Focusing on Chinese web product comment,based on the related research at home and abroad,and through the recently widely used technologies such as natural language processing,opinion mining,data mining,this thesis makes in-depth methods research for Chinese network product characteristics extraction,filtration and clustering.The main contents of this thesis include:1.Based on the theory of association rules and focuses on Chinese web product comment,this thesis proposes an improved Apriori algorithm to abstract the candidate product features set,then preliminary filter the alternative feature set according to the single word rules and redundancy pruning rules.2.Adopting non-features frequent nouns and PMI threshold filtering technology to filter the candidate product features,then we can get the final product features.3.A new semantic similarity calculation between features is proposed by combining the word similarity based on HowNet with the similarity based on the co-occurrence information between feature and opinion.In order to describe the product characteristics well,an improved K-means clustering algorithm is proposed to cluster the semantic similarity features.For verifying the validity of the proposed method,experiments on Chinese web product comment set are constructed from the following three aspects: extraction and clustering of product features and semantic similarity.As can be seen from the experimental results,the method proposed in this paper is effective.At last,the conclusion and further work are discussed.
Keywords/Search Tags:feature extraction, feature, clustering, similarity, review mining
PDF Full Text Request
Related items