Font Size: a A A

Active Intelligent Detection Of False Webpages Based On Web Crawler

Posted on:2016-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:J DingFull Text:PDF
GTID:2308330470471950Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Phishing is a type of acttack which is sending a large number of deceptive emails luring the recipient to give personal information, claimed from banks or other well-known institutions. The most common phishing attack is to lure the recipient to a false webpages which are familiar to the target,and steal the victim’s personal information. Recently, with the false webpages becoming more and more serious, the anti-phishing technology like false webpages detection has been widespread concerned. This paper presents a false webpage active intelligent detection system, which firstly obtains the similar webpages, then extracting the web features and dimension reduction through Autoencoder,finally using the BVM model to classify the unknown webpages.Firstly, because of the lag of the passive detection, this paper adopts active model which using edit distance to compute the similarity between the target and the seeds. Secondly, the testing result depends largely on the webpages feature extraction, so this paper enrich the feature type through analyzing the HTML source and DOM which contains document and topology feature. Then getting the dimensionality of web features by using the Autoencoder. Thirdly, this paper constructed a false Webpage intelligent detection classifier using the BVM which is one of machine learning algorithms, at the same time, this paper gives the steps of detection and experiment based on BVM. As the experiment proves, the proposed Webpage false detection method based on BVM has high detection precision and shorter time consuming. Finally, this paper design and achieve a false webpage active intelligent detection system based on web crawler which using the B/S pattern design and Java programming language, father more, this paper also demonstrate the system architecture and some functional webpages.
Keywords/Search Tags:False webpage Detection, Active Detection, Feature Extraction, Autoencoder, BVM
PDF Full Text Request
Related items