A framework for mining significant subgraphs and its application in malware analysis

Posted on:2015-04-19

Degree:Ph.D

Type:Dissertation

University:The Pennsylvania State University

Candidate:Palahan, Sirinda

Full Text:PDF

GTID:1478390017992476

Subject:Computer Science

Abstract/Summary:

The growth of graph data has encouraged research in graph mining algorithms, especially subgraph pattern mining from graph databases. Discovered patterns could help researchers understand inherent properties and characteristics in large and complex graphs. Frequent subgraph mining has been widely applied successfully in many applications, such as mining network motifs in a complex network, identifying malicious behaviors or mining biochemical structures. However, the high frequency of a subgraph does not always indicate that a subgraph is statistically significant. In this dissertation, we propose a framework for mining statistically significant subgraphs. Our framework is based on a new method for measuring the statistical significance of subgraphs. Given a training set of graphs from two classes (e.g., positive and negative graphs), our method utilizes the class labels provided in the training data to calculate p-values. The p-values reflect how significant the subgraphs are in one class with respect to a null distribution. Our method can assign p-values to subgraphs of new graph instances even if those subgraphs have not appeared before in the training data.;We apply our framework to malware analysis where we extract malicious behaviors from malware executables and calculate their p-values. We focus on this problem because malware is still a serious threat to our society. Traditionally, analysis of malicious software is only a semi-automated process, often requiring a skilled human analyst. As new malware appears at an increasingly alarming rate, now over 100,000 new variants each day, there is a need for automated techniques for identifying suspicious behavior in programs. The contribution of this dissertation is two-fold. (1) We propose a framework for extracting statistically significant subgraphs and apply the framework to identify significant behaviors from malware samples. (2) We develop a methodology for evaluating the quality of significant malware behaviors. The experimental results showed that our framework was able to identify behaviors that are both statistically significant and malicious based on a description by the malware expert. The results also showed that our framework could possibly able to detect unseen behaviors not previously seen in the training dataset.

Keywords/Search Tags:

Framework, Mining, Malware, Graph, Behaviors, Data, Training

Related items

1	Developing Profiles of Malware and User Behaviors Using Graph-Mining and Machine Learning Techniques
2	Research And Implementation Of Graph Mining Platform Based On Parallel Iterative Framework
3	Research On Key Technologies Of Malware Behavior Mining
4	Research On Malware Dynamic Behaviors Knowledge Graph Embedding
5	Android Malware Detection Based On Semantic Attributes
6	Research Of Android Malware Detection Technology Based On Graph Kernel Method
7	Research On Technology Of Intelligent Terminal Malware Detection Based On Anomaly Network Behaviors
8	Design And Implementation Of Data Mining System For Mobile Malware
9	Optimizations For Data Path In Parallel And Distributed Neural Network Training
10	Detecting Malware Domains On DNS Traffic