Font Size: a A A

Research On Code Segment Search Method For Open Source Ecology

Posted on:2022-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2518306602490574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of technologies such as the Internet and artificial intelligence and the emergence of open source software and communities,it is increasingly important to use the knowledge contained in a large number of code resources to develop programs intelligently.Intelligent software development mainly uses intelligent technologies such as machine learning,neural networks,and data mining.Intelligent software development uses computers to continuously learn and summarize the code knowledge and group wisdom retained by the predecessors for software developers,and then help software developers to construct and analyze code intelligently.Code search is the foundation of the intelligent software development process and has important research significance.Code search searches for appropriate code fragments for subsequent use according to the query words entered by the program developer.However,with the exponential growth of open source code in the open source community,the accuracy of code search results is difficult to guarantee.The complex and open open source environment makes it difficult to accurately measure the open source code,and it also makes it difficult to accurately match the open source code with the simple query text entered by the program developer.The main goal of this article is to analyze the characteristics of the code in a typical open source community,and to accurately measure its function and quality in order to search for codes that meet the query requirements.In addition,it is necessary to effectively expand the query terms entered by developers to improve the accuracy of search results.The main work of this article consists of the following three parts: Firstly,in order to make full use of the open source features of open source code and improve the efficiency of code search,this paper analyzes the features of existing open source software codes and combines some of the existing code features to construct a code feature system.For different types of code features,this paper designs targeted code feature extraction methods based on abstract syntax trees,LDA(Latent Dirichlet Allocation)topic models,and information collection to extract open source code features.Then this paper designs a code search method based on Word2 Vec.After a certain text preprocessing process,the code snippet library is matched with the query text of the software developer,and the code snippets most relevant to the query content are fed back to the inquirer.Secondly,in order to make full use of the knowledge and wisdom contained in the open source community and improve the quality of code search,this paper designs a "three expansion and one filtration" method to expand the query terms to more comprehensively express the query task.Firstly,the recent query records of the query user are collected to expand the initial query text of the current query.Secondly,use the initial query to query in the open source question and answer community Stack Overflow,obtain its top-ranked high-quality answers as pseudo-relevant feedback documents,and extract key information terms as an extension of the original query.Then,the two well-known general knowledge graphs,Word Net and MCG(Microsoft Concept Graph),are used to generate synonyms and related words of the original query text at the semantic level,as an extension of the original query.Finally,according to the degree of relevance between the generated expansion terms and the query task or the software engineering code development field,the expansion terms are filtered,and finally an effective query expansion term set auxiliary code search process is obtained.Thirdly,in order to verify the effectiveness of the method proposed in this article,this article designs and implements a code search tool that integrates the method proposed in this paper,and designs experiments to evaluate the performance of the code search method oriented to the open source ecosystem.The experimental results show that the code search method that combines the characteristics of the open source community code proposed in this paper has a certain degree of performance improvement compared with the existing code search methods.And the query expansion method based on semantic expansion proposed in this paper can better improve the quality of code search methods.
Keywords/Search Tags:Intelligent Software Development, Code Search, Query Expansion, Word2Vec, Knowledge Graph
PDF Full Text Request
Related items