Font Size: a A A

Detection Of Software Vulnerability Based On Sequence Clustering

Posted on:2013-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:1228330392454808Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the extensive application of the software, the number of software vulnerabilitiesincreased dramaticlly. More and more softwares become the target of hackers and virusattackers, so software vulnerabilities have been a serious threat to the system security. Ifthe software vulnerabilities can be detected effectively, then the software security can begreatly improved. In order to reduce the false positive rates and false negative rates ofsoftware vulnerabilities detection, the sequence clustering method was applied in thispaper.Firstly, the k-means sequence clustering algorithm based on entropy for detectingsoftware vulnerabilities is proposed. To build the sequence pattern library, the sequenceentropy is introduced. And the global and local similarity measure (GLSM) is defined,which considers both the items contained in sequence and the order of items occurrence.The sequence with the smallest entropy is selected as the center of each cluster, andthen the clusters are obtained based on the largest similarity between the unselectedsequence and the cluster center. In order to analyze the type of vulnerabilities, the GLSMbetween analyzing sequences and each cluster center are computed. Then it was classedinto the most similar cluster. The type is the same as the sequence with the maximumGLSM similarity.Secondly, the weighted k-means sequence clustering algorithm for detectingsoftware vulnerabilities is proposed. The novel sequence similarity based on vector spacemodel is designed, and self-adaptive weight of sequence is defined to measure thedifferent effect of selecting cluster center. The vulnerability sequences are clustered basedon weighted k-means algorithm to build sequence pattern library. Then the analyzingsoftware vulnerability type is matched by the distance between the sequence and eachcluster center.Thirdly, the rapid DBSCAN sequence clustering algorithm for detecting softwarevulnerabilities is proposed. The rd-entropy and s-order are defined in this method. Toreduce the time cost of region queries, the sequence with the minimum rd-entropy is selected as the cluster center. The number of merging sub-clusters is decreased. Thesimilarity measured by distance is very sensitive to noise. So the variation of s-order isused. To detect the software vulnerability, the vulnerability sequences pattern database isbuilt based on rapid DBSCAN clustering algorithm. Then the type of analyzing softwarevulnerability is matched by variation of s-order.Lastly, in order to efficiently detect the unknown vulnerabilities, an incremental rapidDBSCAN clustering algorithm is proposed. Sequence pattern library is built by rapidDBSCAN, and the analyzing sequence is clustered based on incremental rapid DBSCANalgorithm. For the analyzing sequence, the affected set and seeds set are computed. Thenthe extended clusters of seeds are built. And insert operation is completed. Thevulnerability type of analyzing sequence is determined by the relationship of extendedcluster and sequence pattern library.The experimental result shows that the accuracy of similarity and the quality ofclustering are improved by using these sequence clustering algorithms in this paper. Theimproved k-means, weighted k-means, rapid DBSCAN and incremental rapid DBSCANhave better performance and good scalability. Meanwhile, the matching scope is reducedby using sequence clustering. And the false positive rates and false negative rates ofsoftware vulnerabilities detecticon are decreased. So the software security is improved.
Keywords/Search Tags:Data mining, Software vunerability, Clustering, Squence, Similarity, Entropy
PDF Full Text Request
Related items