Font Size: a A A

Research On The Target Identification Of Phishing Based On URL

Posted on:2020-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:2428330590952093Subject:Information security
Abstract/Summary:PDF Full Text Request
Phishing websites are one of the main forms of network attacks.In order to safeguard information security,various kinds of phishing detection technologies are improved constantly.However,there is a lack of specific research and solutions for identifying the attack targets of phishing websites.It is of great significance to determine the attack targets to remind users and attacked websites,to make them take precautions in advance,and to guide the future research direction of phishing websites.Researchers mainly detect phishing websites based on URL and webpage features.And the identification of phishing targets is only an additional function of phishing detection,which is also a preliminary identification based on URL and webpage features,and the accuracy is low,and the complexity of feature extraction is high.At the same time,attackers are constantly updating their attack methods to avoid all kinds of technologies for identifying phishing websites.In order to cope with the detection evasion strategy of phisher,and identify the target accurately that it intends to attack,this paper carries out the research on phishing target identification.The main contents of this paper are as follows:(1)A target identification algorithm based on the similarity of URL is proposed.Through the analysis of the similarity of domain name strings in URLs,the calculation method of edit distance is improved to identify phishing websites that can avoid detection by filling characters.In addition,the calculation times of edit distance are reduced by determining candidate targets,to improve the overall efficiency of the algorithm.(2)A phishing identification algorithm based on language features of URL is proposed.For websites whose targets can not be identified directly from the URL,the domain name features that can classify effectively are selected through the analysis of the linguistic features contained in URLs,and the decision tree classification model is established by using the domain name features to achieve phishing identification.In addition,the decision tree is improved by reducing and simplifying the calculation of information gain rate to improve the efficiency of establishing decision tree(3)A target identification algorithm based on search engines is proposed.For websites judged as a phishing,effective search keywords are selected by analyzing the characteristics of each tag in HTML.The retrieval process is improved by retrieving in three search engines and decoding keywords,to eliminate the misjudgment of single search engine and identify the strategy of phisher evading detection by using unicode coding,so as to complement and amend the identification results of phishing websites.(4)Parallelization schemes of target identification algorithm and phishing website identification algorithm are designed on MapReduce to improve the efficiency of determining candidate targets,calculating edit distance,establishing decision tree and identifying attack targets by search engines.
Keywords/Search Tags:phishing website, URL, target identification, edit distance, decision tree, search engine
PDF Full Text Request
Related items