Phishing Detection Based On Semantic Features And Self-Supervised Model

Posted on:2023-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:X X Quan

Full Text:PDF

GTID:2558307070483884

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Phishing is a typical crime which has a serious impact on finance,politics,e-commerce and other fields.The cybercriminals get sensitive information through specially designed phishing URL that look like legitimate one and cheat the victims into clicking the phishing link.Compared with web-page and visual similarity matching,the detection method based on URL has lower cost and higher detection effect.Most of the phishing detection is based on the URL features,and using machine learning to detect,or using deep learning model for learning and classification based on marked samples.The former mainly depends on URL features and has poor performance.The latter requires balanced sample,otherwise,it will lead to have excellent performance on accuracy but poor on recall.To solve the above problems,this paper proposes two phishing URL detection methods:(1)We Proposed a phishing URL detection method based on semantic features.Firstly,we build the database of basic morphemes,and divide the URL into a set with delimiters.Then,we extract 10 semantic features through word segmentation technology and a basic vocabulary database.Combined with 16 character features form existence studies,and using machine learning for phishing URL detection.Unlike character features,the proposed method extracts the character-level and word-level features,and the results show that the method can reach 96.59% accuracy.(2)For the problem that the number of legitimate URLs is more than the number of phishing URLs,we proposed a phishing URL detection method based on Self-supervised model——PDSS(Phishing detection based on seq2 seq model).Firstly,we extract semantic features and nonlinear features through an encoder for legitimate URL.And we use the seq2 seq model which based on LSTM network,to predict legitimate URL.We set the threshold by calculating the reconstruction loss of the reconstructed URL and the original URL,and identify whether the test URL is phishing URL.Compared with traditional deep learning,the proposed method does not need phishing URL in training,and it can be applied to the situation of lack of negative samples.Experiments showed that proposed method reached a precision rate of 99.68% and a recall rate of 98.11%.

Keywords/Search Tags:

Phishing URL Detection, Semantic Features, Autoencoder, Self-supervised Model

PDF Full Text Request

Related items

1	Detection Of Phishing Emails Based On The Lexical Features
2	Research On Phishing Detection Mechanism By Integrating New URL Features
3	Research On Phishing Detection Based On The Link Features Of Website
4	Research And Implementation On Joint Features And Intelligent Detection Algorithms Of Phishing Webpages
5	The Research On Phishing Clustering Algorithm Based On Fusion Of Multi-Features
6	Phishing Detection Technology Based On URL And Web Page Features
7	Research On The Large Scale Anti-Phishing Detection Engine
8	Design And Implementation Of Phishing Attack Detection System For QR Code Scanning
9	A Phishing Website Detection Method Based On Stacking Model
10	Research On Shilling Attack Detection Method In Recommender Systems Based On Variational Autoencoder And Supervised Prototype Network