Font Size: a A A

Research On Commodity Garbage Comments Recognition Based On Deep Learning Hybrid Model

Posted on:2020-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2428330590451109Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the constant development of Internet applications,online shopping has gradually become a trend in society.Electronic business applications generate massive amounts of commentary data every day,and these product comments have become the primary basis for users to select products.However,due to the openness of the network and the freedom of speech of users,some users will give spam comments after purchase.These comments often affect the user experience,which is not only bad for the maintenance and improvement of the system,but also a great waste of information resources.In order to solve the above problems and explore the value contained in the commodity comment information,this paper comprehensively uses data crawlers,model construction and experimental comparison to conduct research.The main works of the thesis are as follows:(1)Crawl the target product information and comments from the Jingdong Mall website.Based on the Scrapy framework,this article analyzes web pages based on the XPath path of the web page.The multi-threading model is used to increase the speed of the algorithm,and the improved web crawling strategy is used to obtain more valuable comment information for commodity,and finally the data is stored in MongoDB to prepare for the later training process of classification model.(2)In the view of the shortcomings of traditional machine learning in dealing with the classification of comment texts,deep learning can effectively solve the problem of manual intervention,and can automatically acquire the structural features in the data,which greatly saves manpower and time costs.Therefore,this paper uses the advantages of CNN in deep learning to identify local features and LSTM to utilize text sequences,and combines attention mechanism to propose a CLSTM hybrid model algorithm.It can maximize the extraction of context information and efficiently implement the classification of commodity spam comments.(3)In order to test the classification performance of the CLSTM hybrid model classifier,We designed a comparative experiment based on the traditional machine learning model SVM and the single deep learning model LSTM.This paper selects three sets of data sets of different commodity types for model training.In the comparative experiment,the accuracy of the classifier trained using the hybrid model is higher than the results of the SVM and LSTM,which are 85.5%,84.8%,and 85.0%,respectively.Therefore,the superiority of the classification performance of the hybrid model is further proved.
Keywords/Search Tags:Product Comments, Text Classification, Web Crawler, Attention Mechanism, CLSTM
PDF Full Text Request
Related items