Font Size: a A A

A Study On The Method Of Detecting Fakers For Online Forums

Posted on:2018-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:P W YinFull Text:PDF
GTID:2428330590492281Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With Internet's impacting every aspect of people's lives,nowadays people can learn new things more easily than before,every moment information from the Internet affects people's understanding of the things.But these information is not always correct.In fact for the sake of certain interest huge amount of them are fake.Especially with development of Internet,posting fake information forms black industry,these fakers ruin the health of Internet continually,significantly affects people's correct understanding of things and sometimes it results into very bad social influence.So how to detect fake information and fakers is critical to maintain the health of Internet and even health of society.Recent years many researchers focus on how to detect spam reviews and fakers on micro blog,Twitter and Facebook,there is little on online forums.So this paper will be focusing on how to detect fakers on online forums.In order to detect fakers this paper uses two steps to detect fakers.In step one we use machine learning classification model to detect junior fakers.This model has very good performance,its accuracy is about 98.1%,recall is about 99.1% and precision is about 97.2%.And then in step two based on junior fakers and user network we use algorithms like PageRank to get three user ranks in order to detect more fakers,especially senior fakers.In our experiment among more than 5000 users we can detect 78 more fakers and among them 15 fakers are senior ones.Overall the performance is good.The method is firstly we analyze topic sentiment based on dictionary to get topic sentiment vector for every review.Based on topic sentiment vector we can get sentiment features like "Biggest positive/negative topic sentiment" and in step one of faker detection we use these sentiment features,basic user features and time window related features as the feature input of machine learning models.In step two when we calculate the user ranks on the user network we also use topic sentiment vector as the input of sentiment distance calculation.And then we use sentiment distance to get whether two users who post two reviews are support relation,contrary relation or neutral relation.Based on user relationship we can construct user support network,user contrary network and faker support network.And then we use algorithms like PageRank to calculate three user ranks on these three user networks.Finally we use K-Means cluster to analyze these three user ranks and detect more fakers,especially senior fakers.The method is based on many web mining technologies including word segmentation,sentiment classification,feature analysis,machine learning classification,PageRank and data cluster,etc.And with this method performance of detecting fakers is good.
Keywords/Search Tags:data mining, opinion mining, sentiment classification, machine learning, PageRank, data cluster
PDF Full Text Request
Related items