Font Size: a A A

Research On Web Spam Intelligent Detection Method Based On Deep Learning

Posted on:2019-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:X Q NieFull Text:PDF
GTID:2428330548470539Subject:Engineering
Abstract/Summary:PDF Full Text Request
Web page cheating refers to the behavior that a series of cheating techniques are adopted by website designers in order to improving the rank of website in search engines to a position that is not commensurate with its quality.Websites that use cheating techniques are called web spam.As the Internet develops gradually,there are also more and more web spam appearing on the web.As a kind of anti-cheat technology,web page detection methods have received great attention and become the focus of current search engine companies.In this paper,an intelligent detection method of web spam based on deep belief networks(DBN)is proposed,which firstly uses the classification model based on DBN to detect web spam.This paper introduces a web spam detection method based on deep belief network algorithm.Firstly,by analyzing the features of normal web pages and web spam,this paper constructs the corresponding web spam identification index system,and then performs a series of preprocessing operations including dimension reduction on the identification index.Finally,a deep belief networks(DBN)classification model is adopted to detect the web spam.The specific research detail includes the following aspects:1.In order to enrich the types of web page features,and identify the web spam more accurately,the content,link,quality and hidden features are extracted,and establishing the web spam identification index system.Considering the problems of strong correlation and high dimensionality between the indexes,an improved method of denoising autoencoder(DAE)-stacked denoising autoencoder neural network(SDAE)is adopted to reduce the dimension of web spam identification index system.2.To solve the problem that the number of normal web pages and web spam in the network is extremely unbalanced,the SMOTE technique is used to balance the sample dataset before classification using the classifier,so that the classification result of the classifier is not affected by the majority of the sample dataset.DBN is used as the classifier,and the processed sample set is used as the input of the classifier to get the experimental results.Experiments verifiy the efficiency of the classifier.3.Based on the B/S model,an intelligent detection system is designed and implemented using the JAVA language,which includes user login,sample database,system training and system testing four modules.Through a series of tests on the system,it verifies the feasibility of this system for the actual web page detection.
Keywords/Search Tags:web spam detection, feature reduction, deep belief network, stacked denoising autoencoder neural networks, synthetic minority over-sampling technique
PDF Full Text Request
Related items