Font Size: a A A

The Research And Application Of The Detection System For Commodity Spam Reviews

Posted on:2018-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:S H TangFull Text:PDF
GTID:2348330515451686Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,e-commerce industry has gradually ushered in the boom.In this environment,a lot of commodity reviews are produced on the network.The quality of these reviews is uneven.Although they bring great value to businesses and consumers,many challenges are inevitable.Researchers have found that there are a lot of spam reviews in commodity reviews in various fields.Some of them have nothing to do with the goods,which don't have any reference value.Some of them are too much praise or despising,which are not true or even mislead consumers.So it is necessary to filter out valuable and high-quality commodity reviews from the massive data of commodity reviews.The data of commodity reviews on internet are surprisingly numerous,which result in that the operation to them is early as beyond the reach of human resources.Therefore,the use of computer science and technology has become the primary choice to solve this problem.At present,the research concerning how to recognize the spam reviews has gradually matured both at home and abroad,and a set of effective research methods has taken shape.Benefiting by the rapid development of the machine learning,the solution to the question is based on data mining and machine learning technology in current times.The traditional research method is that the researchers man-made extract some features which are very effective to the identification of commodity spam reviews and train the machine learning classifier with a set of training data sets manually collected and marked by them.in finally,they can get the integrated model which can identify the commodity spam reviews.Although the research method has achieved the goal,it also has some flaw.The traditional methods of extracting the features from commodity reviews don't go deep into the semantic level of them,the features they extract are merely based on the surface features of the commodity reviews,so the traditional methods are obviously helpless for extracting the hidden features in the semantic level.Google released a tool named Word2 Vec in 2013,which is a model that can make the words to be expressed in deep semantic level,we can obtain the high-dimensional word vector in a specific scenario by training the tool with the corpus in the scenario.The word vector trained by Word2 Vec has the representation ability in deep semantic level.Therefore,this paper willstudy how to extract the features in semantic level from commodity reviews on the basis of the technology.In this paper,three new feature extraction methods in regard to commodity reviews are proposed,which are WV-1,WV-2 and WV-3.WV-1 is constructed by accumulation of the word vector from certain commodity review.Compared with the traditional features of commodity reviews,WV-1 has a very superior performance on the problem about how to recognize commodity spam reviews.The WV-2 combines the WV-1 features with the traditional features of commodity reviews to complement them each other,further enhancing the effectiveness of the model on the basis of the WV-1.The WV-3 improves WV-2 by adding the information of word weight,and also preforms very well.The three feature extraction methods have successfully expressed the information in the semantic level in commodity reviews.Compared with traditional features extraction methods,they have a extremely superior performance on the same classifier.In the last chapter,the paper also summarizes the software design process of the system for identifying commodity spam reviews based on the above theory.The system successfully applies the new theory to the actual scene,which further confirms the feasibility and validity of the core theory of the paper.
Keywords/Search Tags:the identification of commodity spam reviews, Word2Vec, data mining, machine learning, word vector
PDF Full Text Request
Related items