Font Size: a A A

Research Of Top-k Important Nodes Mining Algorithms In Software Network

Posted on:2017-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiFull Text:PDF
GTID:2308330503982377Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computers and networks, software systems show a increasing scale and more complex using environment, which makes software systems exhibit more complex features. Modeling software system as a kind of complex network, mining high-impact nodes from the view of data mining, which is significant to understand the software’s topology, helpful to support software testing and maintenance, and beneficial for the prevention of vulnerabilities and errors.In our research, we obtain the function call sequences when the software is actually executing by tracking, make a software network by integrating these sequences, mine important nodes in the software network from different points of view, the main work is summarized as follows.Firstly, methods of how to map a software system to the corresponding software network are studied. By analyzing the shortcomings of the existing researches, a new modeling process based on dynamic call sequences among software functions is proposed. The detailed modeling process is given, and the basic metrics involved are analyzed.Secondly, a node’s importance measuring metric based on information entropy according to information flow characteristics of software systems is defined. Storage the sparse graph of software network in memory, get call sequences from root node to each leave node by depth-first strategy, then, the accessible set of each source node can be obtained. Compute the information entropy then the important nodes mining algorithm is proposed, Top-k active nodes in information dissemination can be found clearly.Thirdly, according to the propagation characteristics of cascading failures among software system nodes, another node importance measuring metric in software network is put forward. Value of node’s fault probability is used to keep the fault probability caused by the node itself and its associated nodes which can infect, related algorithm is proposed to mine the metric results, Top-k nodes should be paid more attention because of the high fault probability.Finally, the data of real open source software is taken as the experiment data, the experiments are conducted on the platform of Windows, using language of C++, by comparing with the traditional measures, to verify the methods’ feasibility and accuracy in this article.
Keywords/Search Tags:software network, function call, path sequences, information entropy, fault propagation, important nodes
PDF Full Text Request
Related items