Font Size: a A A

Research On Detection Of Cross-site Scripting Vulnerabilities

Posted on:2018-04-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J K PanFull Text:PDF
GTID:1368330623450453Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet and Web 2.0,Web applications become more and more common,which are deep into people's lives.While bringing us convenience,Web applications are also accompanied by huge risks.A variety of Web vulnerabilities are threatening the security of Web applications and user data.Among them,the cross-site scripting ranks the third in OWASP Web vulnerability list due to its large number,various forms and serious harm.Once exploited by the attacker,cross-site scripting vulnerabilities can lead to a series of serious consequences such as user privacy leaking,user session hijacking,Web fishing and malicious code embedding.Over the years,academia and industry have been committed to cross-site scripting vulnerability detection,prevention and other aspects of research,made a lot of progress.However,the form of cross-site scripting varies a lot,making it hard to prevent.Moreover,the program analysis technology for Web applications is very limited.Therefore,cross-site scripting vulnerability detection is still facing many challenges.In our work,we proceed from two aspects,including the white-box detection and the black-box detection.The input point selection and the exploit generation are taken as entry points in each aspect.The main work and innovation of this dissertation are summarized as follows:Firstly,a new kind of vulnerability aiming at browser extensions which introduces DOM as a new attack surface is proposed,i.e.,DOM-sourced cross-site scripting.To detect such vulnerability,a detection framework combing static and dynamic analysis is proposed.The framework adopts lightweight static analysis techniques such as text filter and AST parser for elementary filtering.The shadow DOM is proposed to support the structural DOM document,extending existing dynamic symbolic execution techniques.The framework is able to generate the DOM document which exploits the vulnerability.From real-world user scripts of the browser extension Greasemonkey,58 DOM-sourced cross-site scripting vulnerabilities are detected with 676,174 potential impacted users.Secondly,a new method of regular-expression-enhanced cross-site scripting vulnerability detection is proposed for the regular expression problem that exists widely in Web applications.This technique enhances the support of a series of advanced regular expression features such as boundary,back reference,and assertion by extending the reduction rule.By introducing more expressive regular expression primitives,the description of regular expressions is simplified.Through “ lazy generation ”,“ on-demand unfolding”and a series of optimization measures,the efficiency of solving regular-expression-related constraints is improved.Thirdly,aiming at mitigating the impact caused by URL rewriting and HTML sanitization,a taint inference technique for cross-site scripting based on gene sequence alignment of bioinformatics is proposed.Through the local sequence alignment algorithm,the problem brought by URL rewriting of Web servers is solved.Through the gap penalty mechanism,the impact induced by HTML sanitization of Web applications is alleviated.Compared with the state-of-art,the accuracy and precision of the taint inference in these two scenarios are effectively improved.Fourthly,to solve the problem of the dependence of existing black-box cross-site scripting vulnerability scanners on human knowledge and the false positives induced by existing machine learning detection technique based on binary classification,a novel detection technique based on the sequence-to-sequence model which learns from the solution of generative intelligent QA system in the filed of natural language processing is proposed.Through modelling vulnerability detection problems into sequence-to-sequence models,verifiable exploiting attack payload are generated,eliminating false positives.Through attention-based encoder-decoder framework where both the encoder and the decoder are data-driven recurrent neural networks with long-short term memory,the dependence on human knowledge is got rid of and different attack payloads are generate according to different context.Our approach effectively improves the detection rate and efficiency of cross-site scripting vulnerabilities.
Keywords/Search Tags:cross-site scripting, Web security, dynamic symbolic execution, constraint solving, document object model, regular expression, gene sequence alignment, sequence-to-sequence model
PDF Full Text Request
Related items