Font Size: a A A

Research On The Semantic Annotation Of Software Vulnerability Source Codes

Posted on:2019-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2428330545457854Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently,with the development of information technology,software technology is also developing rapidly,and the number and scale of software is constantly expanding.Corresponding to software,however,occurring more and more vulnerabilities.According to the statistics of the United States,on average,from one thousand lines to one thousand and five hundred lines of codes,human programmers will leave a software vulnerability in it.Comparing to the rapid development of software technology,the detection technology of software vulnerabilities develop slowly.Today,it still uses the traditional static,dynamic or dynamic and static detection methods.With the increasing of the number of software,the number of software vulnerabilities mining is also increasing.At present,countries of developed information industries in the world have built their own vulnerabilities database,and some enterprises and organizations also built their own vulnerability database.Most of these vulnerability databases are compatible with each other.But how to use these vulnerability databases in vulnerabilities detection is an issue that we need to deal with.Utilizing big data technology to analyze these vulnerability data is one of the ways to use it.However,the vulnerability data exists in the vulnerability database in plain text,which is not understandable directly by the computer.Therefore,this paper preprocesses these vulnerability data,enabling the computer to understand these data.In this paper,semantic annotation technology was employed to preprocess these vulnerability data.Usually,semantic annotation technology was used in image semantics,semantic Web,and other fields.However,no relevant research has been found in the vulnerability source codes.Semantic Web is the technology that using semantic annotation technology transforming unstructured web documents into structured RDF document.This structured RDF documents are easy to deal with by computers.In combination with ontology technology,computer can understand the Web document.Therefore,the semantic annotation of the vulnerability source codes is to transform the plain text source codes file into a structured source codes file.In this paper we transform it to an XML document.This article mainly did the following works:(1)Identification of the entity to be annotated.Firstly we must find the entity to be annotated.This paper researched detail on the vulnerability source codes,and determined the information category of the vulnerability source codes,as well as the vulnerability information is divided into two parts(one is vulnerability description information,the other is vulnerability source codes).Entities in Each part must be extracted separately.The most important of these is the identification of entities in the source codes of the vulnerability.This article extracted these entities based on abstract syntax tree of the vulnerability source codes.(2)Label design.For each entity identified,set the appropriate tag.This article presents a detailed study of the programming language of the vulnerability source codes,and categorizes each element in the program.Each category of the element has a corresponding tag,and for every tag we need determined its child-tags and attributes.(3)Semantic representation.Semantics in the Semantic Web are mainly represented by the ontology.In image semantic annotation field,semantic is the category of the images.In this article,we use the meaning of tags as the semantic representation of vulnerability semantic annotation.Finally,we have an experiment,based on the document of annotated,to mining the pattern of the vulnerability.And this pattern is the same as the vulnerability pattern.This experiment turned out that the semantic annotation method is effective.
Keywords/Search Tags:Semantic annotation, vulnerability source code, Abstract Syntax Tree, XML
PDF Full Text Request
Related items