Font Size: a A A

Research On The Key Technologies Of Massive Open Source Resources Location For Software Reuse

Posted on:2015-08-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:T WangFull Text:PDF
GTID:1108330509961081Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
It is always a crucial challenge in software engineering fields for improving software development efficiency and increasing software systems quality. Software reuse is an important strategy for solving this challenge. With the rapid development of Internet technologies and open source software(OSS), more and more developers envolve in the processes for OSS development. They share their innovations and experiences with each other, and release their projects on the internet. This results in massive open source software and knowledge hosted in diverse open source communities. The open source domain on the whole internet becomes the resource repository, which provides abundant reusable resources. However, these massive open source resources are of various types,highly distributed throughout the internet while tightly interconnected, which lead to great challenge for accurate location of desired resources for reuse. This thesis focuses on open source software location and open source knowledge location, and conducts extensive research.The massive open source software is often published in different open source communities. As for the problem of open source software location, we analyze the function topics of the software based on their online attributes, and build the structured relations among them so as to improve the accuracy for resources location. In specific, this thesis studies two main questions on open source software location including hierarchical categorization and automatic tag recommendation for massive open source software. In open source communities, there are abundant knowledge resources about open software software which are tightly interconnected. Targeted at open source knowledge location, we analyze the semantic relations among these knowledge resources and build correlations between them across different communities based on the text and temporal attributions. In specific, this thesis conducts researches on other two main questions around this problem including related issues location for answering users questions and related posts location for issue resolution. In specific, we focus on the four key questions mentioned above and conduct extensive study as follows:Firstly, to address the problem of classification efficiency and granularity over internatscale open source software, we propose a novel approach for hierarchical OSS categorization based on the aggregation of their online attributes. Different from the traditional approaches which use source codes or API calls to do classification, we leverage the OSS online attributes in the open source communities, including their descriptions and tags,to mine the latent information about their functional and technical features, and achieve efficient OSS categorization. We firstly build a hierarchical and multi-granularity category structure which contains 4 levels and 123 categories based on the categorization system widely used in Source Forge and other communities. Then we employ the SVM algorithm to construct the categorization model based on the aggregation of OSS online attributes across multiple communities, and categorize open source software hierarchically. Our approach greatly improve the categorization efficiency by using OSS online attributes, which makes it possible for categorizing internet-scale open source software.The fine-grained categories in our categorization system will do great help for massive OSS location.Secondly, in open source communities, software tags are annotated manually, and a large proportion of open source software are of no tags. To address this problem, we propose an automatic tag recommendation approach based on semantic graph(TRG) for open source software. This approach does recommendation based on OSS online descriptions and tags. It mainly consists of three stages including online attributes extraction, semantic graph construction and tag recommendation. When extracting software online attributes,TRG mainly focuses on software descriptions and tags, and build correlations between them through open source software; When constructing the semantic graph, we employ the probabilistic topic model to analyze the semantic associations between description words and tags, and establish the corresponding “description word-tag” graph. When doing tag recommendation, we use the semantic graph to compute the correlations between the software and all tags based on the software description, and recommend highly related tags to the software. The experiment results suggest that our approach can do tag recommendation accurately. Meanwhile, TRG achieves high efficiency and can do real-time tag recommendation by using online attributes.Thirdly, targeted at solving users’ questions in Stack Overflow(SO), we propose an automatic location approach based on semantic and temporal attributes, which can build associations between SO questions and related Android issues so as to help answering questions in SO. We firstly make full use of the multi-granularity texts about Android issues and SO posts to analyze the semantic correlations between them, based on which we can get issue candidates according to the correlation strength. Then, based on the intuition of “there should be temporal locality among the correlated issues and posts”,we optimize the rank of issue candidates by exploring the submitting and feedback time between issues and posts. Extensive experiments suggest that, compared with coarsegrained texts of title and tags, fine-grained texts can achieve higher location accuracy.Furthermore, our approach can acquire the temporal correlations between issues and posts effectively, and improve the location accuracy further.Lastly, we study the problem of employing the knowledge in SO to help solving Android issues, and propose a correlated posts location framework Cross Link. Cross Link explores the internal links in SO, and clusters posts based on semantic and network distance so as to assemble the highly related posts first. Then, Cross Link analyze the semantic and temporal correlations between post clusters and Android issues, based on which it recommends related post clusters to Android issues. This framework can introduce the massive developers and their technical professions in SO to Android issues, which provides more comprehensive information and improves issue resolution efficiency. The experiments indicate that our approach can improve the location accuracy greatly in comparison with state-of-the-art researches. Meanwhile, the post clusters can provide much more comprehensive and affluent information, which can be more effective for helping issue resolution.
Keywords/Search Tags:Open Source Software, Open Source Community, Software Reuse, Knowledge Resources, Hierarchical Categorization, Tag Recommendation, Correlation Analysis
PDF Full Text Request
Related items