Font Size: a A A

Design And Implementation Of Web Component Automated Detection System Based On Web Crawler

Posted on:2022-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:2518306338968459Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,a large number of open source components are widely used in the construction of the Web site,but these open source components themselves may also have various vulnerabilities and defects that attackers can easily exploit,so accurately identify the target site Web components information can improve the efficiency of security testing,to ensure the security of the site has important significance.At present,the identification of Web components is mainly based on the analysis of Web source code and response message information based on fixed rules.However,this information is easy to be hidden or modified,which leads to low recognition accuracy and higher requirements for the completeness of Web fingerprint database.Aiming at the above problems,this thesis presents a Web component detection method,and designs and implements a set of automatic Web component fingerprint detection system on this basis.The main work includes:(1)For the recognition of Web server type,a recognition model based on machine learning algorithm is proposed.The multiple classifier based on random forest algorithm is built to respond to the relative order of the first field of the message and the contents of related fields,and the recognition accuracy of Web server type reaches 97.73%.Aiming at the recognition of CMS system type,this thesis proposes a method based on crawler to obtain the path information of static files from multiple pages of target site.By extracting the critical path information,the CMS system can be identified,and compared with the existing detection tools,the recognition accuracy is higher than the existing detection tools.Aiming at the identification of host port fingerprint,a solution that integrates NMAP into the system for automatic port scanning is proposed.(2)Based on the above Web component detection methods,an automated Web component detection system is designed according to the actual requirements of security testing.The system includes five parts:crawler scanning module,task scheduling module,system storage module,user interaction module and Web component fingerprint detection module,and provides a landing scheme for each sub-module.At the same time,Quartz and Redis message queue are used to decouple the system,and the workflow of the whole system is designed.(3)Based on the above design,the automatic Web component detection system is realized.The system uses Redis cluster,Nginx,master and slave MySQL and other technologies to ensure the high availability and easy scalability of the system,and provides users in the form of Web interface.Finally,the target site samples are extracted,and 12 tasks are created to test the main functions of the system,which can realize the automatic identification of the target site Web components.
Keywords/Search Tags:Web component, Web server, CMS, port fingerprint, task scheduling
PDF Full Text Request
Related items