Font Size: a A A

Research On Software Vulnerabilitie Oriented Mining Methods

Posted on:2015-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q CengFull Text:PDF
GTID:1268330422970706Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The global popularization and development of Internet make the computer networkclosely related to people’s life, and the information security gradually becomes the coreissue of information technology. Software vulnerabilities detection is an importantcomponent of information security and it can bring serious harm. A malicious attacker canuse software vulnerabilities to access unauthorized resources, led to the destruction ofsensitive data, and even threaten the entire information security system. In this paper, inallusion to the mechanism of software security vulnerabilities and the deficiency ofexisting vulnerabilities detection technology, combining with data mining technology, asuitable mining algorithm for software security vulnerabilities detection analysis isdesigned, potential useful information can be gained from a large amount of data to formthe vulnerability information knowledge base, and the base can effectively describe theproperties of the data and the set of the object, association rules with guidance aregenerated and the quality of the software vulnerabilities detection is improved.First of all, association rules mining and cluster analysis technique are applied tosoftware vulnerabilities detection analysis, in order to solve the problem of high falsepositive and non-response rates in the existing software security vulnerabilities detectionanalysis model, a model of software vulnerabilities analysis based on data miningtechnology is proposed. The model is constructed by two modules, the formation of thevulnerability database and the match of the suspected software vulnerabilities.Vulnerability sequential database is formed after doing data preprocessing with thecollected vulnerability information, through the frequent sequential pattern miningalgorithm to mine frequent sequences from the database. On this basis, using clusteranalysis technology with a similarity measure method to cluster the frequent sequencesand form vulnerability information knowledge base. The match process is to first adoptcluster analysis technology to ensure which cluster the suspected vulnerability sequencesin the software under test are in, then match the suspected vulnerability sequences with thesequences in the ensured cluster, and the software vulnerability detection report is finally outputted.Secondly, considering request of the software vulnerability model and aiming todifferent characteristics of the data source, three sequence pattern mining algorithms forconstructing vulnerability database are proposed. The main memory index based timeinterval weighted sequential pattern algorithm is suitable for the sequential patterns withtime constraints, it loads sequence database into memory without generating candidate setor structuring projection database. It gives full consideration to the interval factors andreflects the significance of weight well. The prefix sequence tree based closed sequentialpattern mining algorithm in data stream are applied to the data in a flow form. It uses thesliding window technology in data stream and mines in segments in each window. Itpresents a new data structure which can effectively overcome the high depth problem ofthe prefix tree and reduce the search time of the tree. The pattern attenuation basedincremental weighted sequence pattern algorithm in data stream is applicable to the datathat needs to be weighted according to different demands. It is a method that adoptsdouble weight, the single item weight is set by the users and single sequence attenuationcalculation is carried through time constraints.Thirdly, a large number of frequent sequence information is stored in thevulnerability database. The number is still be large after frequent sequence pattern miningwhich can cost much time overhead when matching the suspected vulnerability. To shrinkthe searching space, the attributes relevant subspace clustering algorithm based onDBSCAN is used to determine which cluster a vulnerability sequence belongs. Squareerror of the attribute value is calculated to ensure the active attribute according to whichthe dimension is reduced. Threshold weighting technology is defined and DBSCANalgorithm is called to generate a one-dimensional cluster in each active attribute. Clustersimilarity is used to merge clusters, subspace clusters are gotten. The sequence under testcan first get its cluster and it rough search range to avoid the unnecessary searching time.Finally, the prefix index set with a wildcard matching algorithm will match besuspected vulnerabilities sequence in software under test with knowledge base. A largenumber of candidate sets and projection database are avoided, and the waste of runningtime is reduced. The algorithm is based on memory index strategy, according to the wildcard constraints limit, it defines a set of new prefix index, the method of incrementalmining and index lookups are used. Prefix index with wildcards constraint set can directlygive the appeared location information of a sequence. The whole mining processes of thealgorithm adopts the idea of divide and conquer. For different needs, by adjusting thenumber and position of the wildcard can respectively deal with the marked and unmarkedsequences and that avoids the sequential patterns defects.
Keywords/Search Tags:software vulnerabilities, data mining, data streams, frequent sequence, clustering technology
PDF Full Text Request
Related items