The current Web technology is developing rapidly with the popularity of the Internet,especially the fast-developing mobile Internet makes Web technology always accompany each of our Internet users.However,Web technology is a double-edged sword,which brings convenience to Internet users,but also brings huge security risks.In recent years,web application security incidents have occurred frequently,and web application vulnerabilities have seriously threatened the security of web applications and the privacy of related users.Among the top ten web application security vulnerabilities released by OWASP since 2007,SQL injection vulnerabilities and XSS vulnerabilities have consistently ranked among the top three,and are the most common and dangerous types of web application vulnerabilities.Therefore,the detection and repair of Web application vulnerabilities is extremely important to ensure the security of Web applications.The relevant research carried out in this paper has important practical significance and practical value.The basic starting point of web application vulnerability detection is to view the environmental security of web applications from the perspective of an attacker.In other words,by sending a number of network requests to the server,and then analyzing the server's response information,to determine whether the detected Web application has corresponding types of vulnerabilities,so as to repair the corresponding security issues in time,improve program security,and better protect User privacy.Based on the in-depth study of web crawler technology,this article optimizes and improves the crawler strategy,clusters similar pages,designs a more comprehensive and rich vulnerability attack vector library,and implements a crawler-based Web application vulnerability detection prototype system The main work and innovations are as follows:(1)In view of the problems of the existing web crawler search strategy that are caught in the loop and too many pages are crawled,comprehensively considering the number of new pages included in the web page and the relevance of the sibling nodes,a breadth search strategy based on neighboring sibling nodes is proposed to improve the network crawler efficiency.(2)According to the similarity analysis of the DOM structure of the page,a method for calculating the similarity of the page structure based on weight distribution is proposed,and then the hierarchical clustering method is used to cluster the structurally similar pages,and one is extracted from each category The URL corresponding to the page is used for vulnerability detection.Experiments show that related methods and cluster processing can greatly improve the efficiency of vulnerability detection.(3)Considering that second-order vulnerabilities are more concealed and more difficult to discover than first-order vulnerabilities,the attack principle of Web second-order vulnerabilities is deeply studied,and a second-order vulnerability detection method based on taint marking and tracking is proposed.Further,first,mark the requested URL,then track and search for the page that may store the stain,then inject the attack vector into the URL of the mark and request it,and finally perform vulnerability detection on the page that may store the stain.Experiments show that this method can effectively improve the detection efficiency of second-order vulnerabilities.(4)Aiming at the relatively lack of attack vector library and weak attack vector attack,the grammatical rules and composition structure of attack vectors of SQL injection and XSS attacks are deeply studied,and the rule-based attack vector generation method and attack vector factor variation are proposed method.Experiments show that,while enriching and expanding the attack vector library,the related methods can also effectively improve the attack vector's aggressiveness.(5)According to the researched detection method,combined with the idea of "high cohesion,low coupling",we designed and implemented a web application vulnerability detection system based on web crawlers.The system is divided into four major modules in function: main control module,web crawler module,attack vector generation module and vulnerability detection module.Experiments show that this system has higher performance and accuracy than mainstream vulnerability detection tools AWVS and AppScan. |