Font Size: a A A

Research On Program Comprehension Technique Based On Topic Model

Posted on:2017-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2308330488495179Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Software products are naturally maintained and evolved with the changing system requirements to meet user’s needs. During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Program comprehension aims to obtain enough information in the software system to promote the comprehension of the target software. The main contributions are shown as follows:First, packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. We focus on understanding these large-sized packages, and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster the large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Moreover, each cluster is labeled with a topic, which is used to understand the clusters. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than a Latent semantic indexing (LSI) based clustering approach. In addition, we find that the topic that labels each cluster is useful for program comprehension.Second, programmers are accustomed to get a general view of the features in a software system and then find the interesting or necessary files to start the understanding process in practice. Given a target system, developers may need a general view of the system. The traditional view of a system is shown in a package-class structure which is difficult to understand, especially for large systems. We focus on understanding the system in both feature view and file structure view and propose an approach to generate a feature tree based on hierarchical Latent Dirichlet Allocation (hLDA), which includes two hierarchies, the feature hierarchy and file structure hierarchy. The feature hierarchy shows the features from abstract level to detailed level, while the file structure hierarchy shows the classes from whole to part. Empirical studies on two real-world software projects demonstrate the effectiveness of our approach. The results show that the feature tree can produce a view for the features and files, and what’s more, the clustering of classes in the package in our approach is better (in terms of recall) than the other clustering approach, i.e., hierarchical clustering.Third, features and their relations can help the developers get full information for the software systems at hand. So, building a network based on features can make program comprehension more easily and quickly. We propose a novel technique, which uses relational topic model (RTM) to model all code (class-level) documents in the software system into a program network. Then, the program network is visualized to help developers understand the whole software. The advantage of RTM is that it takes into account both the structural and textual information in the software system, which enables developers to fully understand the syntax dependence and semantic functional relationship in the program. We develop a tool to create the program network in Java and R. The program network is accurate enough to model the relation among different classes; moreover, it is able to recommend relevant classes for a given class to understand a local part in the program.
Keywords/Search Tags:Software maintenance, software evolution, topic models, program comprehension, clustering
PDF Full Text Request
Related items