Research On Spam Detection Based On Heterogeneous Ensemble Learning

Posted on:2020-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Liu

Full Text:PDF

GTID:2428330599460276

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Since online shopping does not have access to goods,users can only learn about relevant product information from the e-commerce platform.The commentary information is increasingly being valued by users.Many merchants find that praise can bring huge returns,and bad reviews can make opponents lose money or even close down,so the "spam" behavior has always existed.In order to prevent sellers from vicious competition,ensure that e-commerce platforms can trade fairly,and protect consumers' rights and interests from infringement,detecting spam has always been a research hotspot.This article conducts in-depth research on spam detection.The main work is divided into the following aspects:Firstly,the Word2 vec model does not recognize the information of word pair.The Bigram-Word2 vec model is proposed.The model firstly uses the Bigram model to identify the word pair information in English.On this basis,after processing the text information,it is input into the Word2 vec model to train the relevant word vector.Secondly,the quality of the word vectors trained by the Bigram-Word2 vec model is different due to the difference in the number of word pairs.In order to further optimize the Bigram-Word2 vec model,this paper attempts to take multiple sets of values for training word vectors to find the optimal word vector.Again,in order to solve the problem of using a single machine learning model in the traditional spam detection field.This paper applies relevant knowledge in the field of heterogeneous integration learning to the field of spam detection.In the process of trying to integrate multiple heterogeneous models,two solutions are proposed for the problem that the hard voting method in heterogeneous integration learning encounters the same number of votes and how the weights in the soft voting method are set: Two-class weighted hard voting and weighted soft voting.Finally,this paper uses a variety of text feature extraction methods to extract features from Amazon datasets,and then combines multiple models to classify text.In order to explain the reasons for the unsatisfactory classification results,the concept of �repetition rate of words� is proposed.The method proposed in this paper was also verified on the data set.

Keywords/Search Tags:

machine learning, heterogeneous integration learning, voting, spam detection, Word2vec

PDF Full Text Request

Related items

1	The Research And Application Of The Detection System For Commodity Spam Reviews
2	Research On Opinion Spam Detection Based On Deep Learning
3	Email Classification Based On Word2vec
4	Research On Multi-view Learning For Web Spam Detection
5	SVM-Based Novel Method Of Online Spam Filtering
6	Content-based Anti-Spam Filtering
7	Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And Svm
8	Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And SVM
9	Machine learning for image spam detection: From server to client solution
10	A Study On Optimization Of Pre-trained Chinese Word Embedding In Transfer Learning