Font Size: a A A

Research On Data Processing Technology For Commodity Comment Text

Posted on:2019-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z F XiaoFull Text:PDF
GTID:2428330602460384Subject:Engineering
Abstract/Summary:PDF Full Text Request
Online shopping has become a mainstream shopping channel,which brings convenience to people's daily life.But unfortunately,there are still many problems in the variety of products and lack of experience,which makes consumers have difficulty in purchasing.The user comment information on the e-commerce website is the subjective feeling of the product after the purchased customer's personal experienee,which can provide reference for potential customers.However,with the proliferation of comments and the unreasonable display of website reviews,it is difficult for users to obtain information that is valuable to them.Firstly,the paper collects the comment texts on the shopping website through web crawler technology,and adopts techniques such as data cleaning,subjective clause extraction,Chinese word segmentation and stop word processing to preprocess the comment text and improve the effective data.Secondly,existing algorithms have problems in extracting the incompleteness of text feature words and the serious loss of semantic information and the solution is given in this paper.The improved clustering algorithm is used to cluster the feature words.Finally,the construction of commodity features and emotional dimensions is realized by using feature word clustering results and emotion tags.The paper explores consumers' attention and subjective feelings from the perspectives of characteristics and emotions.This dimension system provides an objective and effective reference standard for users' purchasing decisions,and the feasibility and effectiveness of the method are verified by experiments.The research work in this paper mainly has the following three contributions:(1)This paper combines syntactic analysis with word frequency and part-of-speech to realize the expansion of low-frequency characteristic words of commodities and improve the accuracy and comprehensiveness of feature word coverage.In order to describe the semantic information of commodity feature words and the connection with surrounding words accurately,this paper introduces the Word2vec model into the feature word vectorization process to reduce the loss of semantic information.(2)In this paper,the traditional CURE(Clustering Using Representative)clustering algorithm is improved on the representative point selection rule,and the accuracy of CURE clustering algorithm in short text processing is improved.It is applied to the clustering processing of feature words.The improved algorithm in the paper has better experimental results than the traditional CURE algorithm.(3)On the basis of the above research work,this paper extracts the feature word clustering result and combines the word emotion label to construct a more realistic and effective product feature and emotional dimension system.Then it calculates the weight of each feature dimension of the commodity and conducting emotional quantitative analysis.The analysis results provided have high reference value.
Keywords/Search Tags:Data collection, Comment analysis, Text mining, Topic words clustering, Feature dimension
PDF Full Text Request
Related items