Font Size: a A A

Design And Implementation Of Near Real-Time Malicious URL Detection And Analysis System

Posted on:2019-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2428330572955935Subject:Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of computer technology and Internet technology has brought new changes to people's lives.Internet social networking,e-commerce,and Internet finance have been integrated into people's daily lives.At the same time,some malicious links,namely malicious URLs,would lead users to malicious and offensive websites,or link users with malicious files,or direct users to phishing websites,which wait to steal information such as the user's account,infringing on the user's personal information.Malicious URLs bring a great deal of risk,can result in the loss of the user's property.How to effectively detect these malicious URLs in real time and make timely protection measures such as alarms and prohibition of accesses,so as to reduce the threat and infringement of malicious URLs to massive network users,is always a problem that needs to be solved in the field of network security.In this thesis,an online learning algorithm is used to train the malicious URL detection model,which makes full use of the high efficiency of the model updating of the online learning algorithm.And the online learning algorithm achieved processing the unbounded data by using of limited computer resources.Using streaming computing framework Flink to realize the real-time consumption of URLs of the network flow data in the message system Kafka,the detection model obtained through online learning algorithm training realizes near-realtime detection of URLs.At the same time,using the search engine Elastic Search achieved a system for large-scale network streaming data retrieval analysis.This thesis mainly completes the following works.1.The detection model for URLs using offline batch data training takes a long time,leading to the model cannot be updated in time.The URL classification model is not efficiently.And the sample data volume is too large to use limited computer resources to achieve model training is another matter.This thesis uses the online learning algorithm trains the URL classification model.Taking advantage of the online learning algorithm can update the model in time by the single sample data,as well as the online learning algorithm could calculate data by limited computer resources.The real-time collection of network flow data obtained is achieved through Flume.Taking Flume as the URL data production end of Kafka,and the Flink end as the consumption of URL data.The online learning algorithm takes the URLs asinput and makes prediction of the URL's label.In this way,URLs could be detected to be malicious or benign.2.For the analysis efficiency of billion-level data retrieval,this thesis uses Logstash to build the indexing of all types of network stream data obtained by parsing the network packets.Logstash built the billion-level data retrieval scheme and transferred the index to the distributed retrieval engine Elastic Search.Through Elastic Search,a search system based on B/S framework was built.Using malicious URLs as search keys,DNS,IP,and other information related to the detected malicious URLs can be retrieved within seconds response among billions of data.And the search system improved the efficiency of the analysis of massive network flow data.
Keywords/Search Tags:Malicious URL, streaming computing, online learning, real-time computing, ElasticSearch
PDF Full Text Request
Related items