Font Size: a A A

Feature Selection And Detection Performance Improvement For Mining Fraud Web

Posted on:2019-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:J Q WangFull Text:PDF
GTID:2348330569488913Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of information age,the Internet brings people great convenience but also enormous security problems.Among these problems,Web spam with Web fraud as its core is rampant.Fraudulent Web pages exploit various camouflage methods to deceive Search Engines to improve their page rankings,thereby achieving advertisement,illegal pyramid selling and other purposes.In the game of Web spam and anti-spam,how to efficiently detect spam pages and construct a harmonious,secure internet environment becomes urgent.There exist two challenges in the research of Web spam detection.On the one hand,the high dimensionality and redundancy of basic Web page features increase the computational cost of Web spam detection,thus reducing the detection efficiency.On the other hand,the privacy of sensitive data may be revealed in the process of fraudulent web pages detection.To meet the above challenges,several feature selection algorithms are proposed in this thesis.In addition,a feature selection algorithm,which considers both data privacy protection and detection performance,and an efficient,secure Web spam detection model are presented.This thesis first studies basic web features and the corresponding detection tasks.It also focuses on the optimal selection of basic Web page features.Through the analysis of several feature selection algorithms,an improved feature selection algorithm based on information gain and genetic algorithm(IFS-BIGGA)is proposed,which generates an optimal minimum feature subset(OMFS).In order to analyze and compare the effectiveness of IFS-BIGGA algorithm,three feature selection algorithms based on a random forest and a neighborhood rough set model are implemented.The experimental results demonstrate that the IFS-BIGGA algorithm is superior to other feature selection algorithms.Considering the importance of data privacy preservation in Web spam pages mining,a cascade feature selection algorithm(PPGAFS)based on privacy protection is implemented.It adds privacy degrees and confidence degrees based on the conditional entropy of IFS-BIGGA.PPGAFS settles the contradiction of improving the detection performance and protecting data privacy in Web spam page mining.Based on PPGAFS,this thesis proposes an efficient and secure Web spam detection model(WSDM),which mainly includes four stages,i.e.,data discretization,data balance,feature selection and classification.In order to verify the effectiveness of PPGAFS and WSDM,sets of comparative experiments are carried out on the WEBSPAM-UK2007 dataset.The experimental results demonstrate that the proposed WSDM is superior to other new detection schemes and protects data privacy and improves the performance of Web spam detection.
Keywords/Search Tags:Web Spam Detection, Feature Reduction, Feature Selection, Data Mining, Privacy Preservation, Detection Model
PDF Full Text Request
Related items