Font Size: a A A

Phishing Websites Detection Using Selected Features Classification And Bidirectional Long Short-Term Memory Neural Networks

Posted on:2019-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y DuFull Text:PDF
GTID:2348330569488948Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The progress of the times and the development of Internet technology bring convenience to our life,but there are also some information security problems at the same time.Phishing is a typical way to deceive users and get sensitive information.This kind of attack,which has a great economic profit,frequently happened.It weakens the mutual trust between Internet users and greatly slows the flourishing development of network trade.It can be seen that how to detect phishing websites accurately and efficiently becomes the focus of network information security research.Phishing websites detection technology based on machine learning is the hotspot of phishing research.The key points of the method are the construction of features and the selection of classification algorithms.First of all,this thesis makes a thorough study on related features of the phishing websites.We carry out a statistical analysis of twenty thousand URL samples(half positive and half negative samples),not only including common features such as URL and HTML,but also WHOIS information and ALEXA information.An efficient feature combination is constructed by feature selection algorithms,and then machine learning algorithms are used to classify websites and compare results.Experiments show that the random forest algorithm can distinguish phishing websites better.Phishing websites have short survival time and varied forms.The artificial feature extraction always depends on the prior knowledge of human.In the research mechanism mentioned above,some commonly used URL features can not effectively distinguish new phishing websites and the phishing websites detection with multi-feature fusion is inefficient.Therefore,we propose a phishing websites detection mechanism using neural network to learn URL sequence.Bi-directional LSTM can learn the serialization features and long-term dependencies,then capture the implicit dependency between the URL sequences.Thus,Bidirectional LSTM can be applied into the task of phishing websites detection.In this thesis,URL text is transformed to the word vector,and then the training data with positive and negative labels are sent into the neural network model.The classification model is trained by the reverse propagation algorithm.In order to verify the effectiveness of the classification model,cross validation experiments are carried out.The experimental results show that the proposed method in this thesis achieves higher Accuracy and Recall rate,and effectively reduces false positive rate and false negative rate.
Keywords/Search Tags:Phishing Websites Detection, Multiple Features, Random Forest, Word Vector, Long Short-Term Memory neural network
PDF Full Text Request
Related items