Font Size: a A A

Design And Implementation Of Collection And Identification System For Gambling Sites

Posted on:2020-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:X F LanFull Text:PDF
GTID:2428330575998498Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the booming Internet and the increasing number of Internet users,the security threats that malicious websites pose to people are numerous.A series of illegal websites such as gambling,reactionary organizations,counterfeiting,fishing,and fraud.In order to deal with this malicious behavior,the traditional method is to block by blacklisting.However,with the application of some new network technologies,the emergence of malicious websites makes the traditional methods difficult to cope with.While malicious websites introduce flexibility for their malicious behavior,it is inevitable to introduce features that are different from normal websites.As a core defense technology in network security,malicious website identification technology can effectively identify and prevent a series of security threats and protect the security and health of network use.This paper compares and analyzes the research results and current status of malicious websites at home and abroad,and selects the gambling website identification technology applicable to this system for the field of gambling websites.The main functions of the system include:data acquisition module,data preprocessing module,data storage module,and data identification module.The system is a research project based on the company's data analysis system,focusing on the gambling website,a detailed analysis of the main functional modules,and a description of the use cases.Under the premise of real-time monitoring of the monitoring website,this paper will design the overall circulation process of the gambling website data from the collection to the use for each module,and finally make the identification data can be produced in real time.At the same time,in order to facilitate maintenance and secondary development,following the mainstream technical route of the team,the implementation of the gambling website identification system will be based on the Linux operating system,developed using the Python language and the Scrapy framework,according to the accuracy of the selected machine learning algorithms.Applicable machine learning algorithms are used to identify the functions of web page data collection,preprocessing,storage and recognition.Finally,statistics and analysis can be performed on the newly emerging website data,which can be performed with certain accuracy.Identification,persists identification data,and stores it in storage systems such as ElasticSearch,providing a high-quality,comprehensive source of data for all downstream services.Used for malicious website analysis with a range of security devices.The gambling website identification system has been completed and is actually running as a core component in the company's data analysis system,and providing services for security protection in the context of continuous improvement of data identification accuracy.
Keywords/Search Tags:Gambling website, The frame of Scrapy, ElasticSearch, The algorithm of Machine learning, Identification
PDF Full Text Request
Related items