Font Size: a A A

Research On Malicious IP Classification Algorithm Based On Big Data Platform

Posted on:2020-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:L H XueFull Text:PDF
GTID:2428330578976868Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,in the railway ticketing system,the online ticketing volume far exceeds the station window,national outlets and other ticketing channels,and online ticketing has become the primary choice for the public to purchase tickets.Meanwhile,driven by interests,the Internet ticketing industry is also facing the threat of gray industry.The research found that during the peak season of ticket sales,such as Spring Festival travel rush and holidays,the phenomenon of malicious ticket brushing frequently exists,which seriously affects the normal ticket buying experience of users.In order to intercept and process some malicious data requests in real time,a risk control system based on big data platform is developed.In the strategy analysis section of the system,the selection of policy threshold is affected due to the inability to effectively distinguish the source of current request IP,and there is a risk of accidental injury.In addition,the ticketing system is faced with tens of millions of visits every day,generating massive data sets.How to efficiently classify data has become the key to the problem.In order to solve the above problems,combined with the relevant knowledge of data mining,the thesis proposes a malicious IP classification algorithm based on big data platform.The main contributions of the thesis include the following aspects:1.On the classification of malicious IP,the thesis introduces common classification algorithms.Through simulation experiments,the advantages and disadvantages of existing algorithms are analyzed,and RF,a random forest algorithm more suitable for current application scenarios,is selected.In order to improve the classification accuracy,a random forest based malicious IP classification algorithm IPRF is proposed.IPRF algorithm mainly improves the steps of feature selection by adopting the feature selection method combining Bagging and Forest-RI to increase the randomness of sample feature selection,and introduces the weight calculation based on OOB estimation when constructing the classifier.The comparison experiment of five sets of data shows that IPRF algorithm can effectively improve the classification accuracy and the performance of the classifier.2.In terms of the efficiency of data classification,the parallelization idea based on MapReduce framework is proposed for the massive amount of data.Combined with IPRF algorithm,the malicious IP classification algorithm based on big data platform is proposed,and the parallelization process is studied and designed.Based on the big data platform,the feasibility of the algorithm is verified through the comparison experiment of three sets of data with different quantities,which shortens the execution time of the algorithm and effectively improves the efficiency of data processing.The implementation of the malicious IP classification algorithm based on the big data platform can make the strategy analysis of the risk control system more perfect,effectively avoid the accidental injury to normal users,thus promoting the selection of the policy threshold to be more reasonable,so as to better improve the risk control system,and effectively strengthen the identification of abnormal ticket buying behavior.
Keywords/Search Tags:Random forest, Big data platform, Parallelization, Malicious IP classification
PDF Full Text Request
Related items