Font Size: a A A

Research On Key Class Identification In Software System Based On Graph Neural Network

Posted on:2021-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ChenFull Text:PDF
GTID:2518306539957969Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
With the increasing of scale,the complexity of software system is also increasing,and the workload of software development and maintenance is also hard.A series of studies in recent years have shown that there are some key classes in software systems,namely key classes.These classes are usually located at the core of the software system topology.If the key class fails,it will damage the connectivity of the software system topology network,so the defects in these classes will bring great security risks to the system.In addition,identifying the key classes of the software system is also important for engineers to understand or maintain an unfamiliar software system.Identifying the key classes of a software system is essential for engineers to understand or maintain an unfamiliar software system.The rise of complex network theory has opened up a new way to study the characteristics of large-scale software systems,and the sorting of node importance in complex networks also provides a new perspective for the identification of key classes in software systems,and many related methods have been proposed.The existing methods of key class identification based on software network mainly focus on improving the strategy of software network modeling,or improving the node importance metrics(hand-crafted)from the aspects of network function(such as invulnerability,propagation influence)and topology structure(such as clustering coefficient,degree correlation,centrality).In recent years,deep learning paradigms(such as convolutional neural network(CNN),long short-term memory(LSTM)and auto-encoder)have completely changed the situation of relying on manual feature engineering to extract information features.Graph neural network is also inspired to automatically process graph data,and successfully applied to social networks and information networks.However,we find that there is no research on the use of graph neural network to identify the key classes of software systems.Therefore,this paper attempts to adopt graph neural network to learn the importance of nodes on software network,and realize the identification of key classes in a software system.This paper refers to the graph neural network framework applied in other fields and improves it to be suitable for the key class recognition tasks of software systems.First,by parsing software source code,the dependency between classes is obtained,and then construct the weighted software network model.Secondly,the nodes in software network are mapped to a low dimensional embedding vector by using network embedding learning.After that,we further construct a graph neural network based ranking model to identify the key nodes.More specifically,an encoder is designed in a neighborhood-aggregation fashion,and a decoder is designed as a multi-layer perceptron(MLP).The encoder leverages the network structure to encode each node into a representation vector,which captures the important structural information of the node.The decoder transforms the representation vector for each node into a scalar,and then the pairwise ranking loss is used to train the model to identify the orders of nodes.Finally,we use three indexes to evaluate the identification accuracy of key classes from the perspective of network robustness.To evaluate the proposed method,we first trained the model through five artificial complex network,and performed validation experiments on four open source Java software(two of which have labelled the key class information).We compared and analyzed with 5commonly-used methods and 3 existing studies respectively.The experimental results indicate that,compared with betweenness centrality,k-core,closeness centrality,node contraction and Page Rank,the method proposed in this paper is better in identifying the key classes from the perspective of network robustness.In addition,on the two existing public annotation datasets,the recall and precision of the first 15% of the key classes in our method are better,with an increase of more than 10%,and the maximum of 80%.The research results can be used to guide the search of key classes in the process of software development,and provide a sort list of classes for software engineers as the starting point and evidences of code understanding.
Keywords/Search Tags:Software network, key class, network embedding, graph neural network, learning to rank
PDF Full Text Request
Related items