Fraudulent URL Detection Based On Big Data

Posted on:2019-09-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y R Huang

Full Text:PDF

GTID:2428330590992394

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Phishing is a deception technique that utilizes a combination of social engineering techniques and sophisticated attack methods to gather sensitive and personal information,such as passwords,account details and credit card details by masquerading as a trustworthy person or business in an electronic communication.Aiming at the limitations of existing anti-phishing solutions,a fraudulent URL detection scheme based on big data is proposed in this paper.The research content and main work of this paper are as follows:1.This paper first discusses the definition,working principle,common attacking method and types of phishing websites,then reviews the current mainstream anti-phishing technologies and summarizes the advantages and disadvantages of all these detection techniques.2.This paper presents a detection algorithm based on multiple features of websites.The algorithm analyzes the website's URL and its page content,as a feature vector,and then uses Random Forest,Logistic Regression and Support Vector Machine to classify websites as phishing website or not.3.This paper presents a character-level recurrent neural network for phishing detection.The input of the algorithm is the preprocessed URL string.The algorithm first uses the Skip-gram model in Word2 Vec to convert all the characters in the URL into word vectors and then uses Bi-directional Long Short-term Memory to complete the encoding of the URL text and finally uses activation function to classify phishing websites.4.Finally,the above research results are applied to the Spark MLlib and Keras framework to implement a real-time detection system of fraudulent URL.The average throughput of the system has reached 1000 urls per minute.Experiments show that Bi-directional LSTM can effectively use semantic information and has a better performance than the traditional methods.It is shown by experiments that the proposed algorithm achieves precision of 98% on average on data set downloaded from Phish Tank and DMOZ sites.

Keywords/Search Tags:

Phishing, Feature Extraction, Machine Learning, Recurrent Neural Network, Long Short-Term Memory

PDF Full Text Request

Related items

1	Phishing Websites Detection Using Selected Features Classification And Bidirectional Long Short-Term Memory Neural Networks
2	Long Short Term Memory Recurrent Neural Network Application To Handwritten Recognition
3	Research On The Trend Feature Extraction Of Securities Data Based On Machine Learning
4	Design Of A Blind Equalizer Based On Long Short-term Memory Neural Network
5	Online Handwritten Math Expression Label Recognition Based On Long Short Term Memory Recurrent Neural Network
6	Research On Phishing Website Detection Technology In Dual-structural Network
7	The Research On SQL Injection Detection Technology Based On Naive Bayes And LSTM Recurrent Neural Network
8	Research On Chinese Text Classification Method Based On Long And Short Term Memory Network
9	Improved Recurrent Neural Network Method And Its Application
10	Research And Implementation Of Deep Learning Method For Malicious Counterfeit URL Detection