Font Size: a A A

Research On Code Search Technique Based On Multidimensional Labelling Information

Posted on:2021-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhangFull Text:PDF
GTID:2518306557989379Subject:Computer technology
Abstract/Summary:PDF Full Text Request
During the development of software engineering for more than 50 years,a large number of high-quality codes have been accumulated,which are widely distributed in the network,personal computer,and code repositories.With the development of software development in the direction of intelligence and automation,the massive code will play a huge value of reuse.Reusing the existing code can reduce the difficulty of development and improve the efficiency.Researchers propose a variety of code search techniques to help developers search for expected code snippets from a large number of static code.There are some shortcomings in the existing search techniques,such as the lack of multi-angle description of code snippets,which leads to inaccurate labelling information,the unreasonable mapping between code labelling and query,which leads to the inaccurate search algorithm,and so on,which affects the effectiveness of code search results.To solve the above problems,this thesis proposes a code search technique based on multidimensional labelling information,called MLCS for short,extracts multidimensional labelling information from code text information,code element role,and program structure,and combines natural language features and programming language features to achieve accurate matching of query from multiple perspectives,helping developers search for the desired code snippets.In this thesis,we improve the traditional code labelling and search algorithm to improve the accuracy of code search results.First,we extract the text information,code element role and program structure information from the source code as the data source,then obtain the multidimensional labelling information through denoising and fusion operations,and use synonyms to reconstruct query to enhance the understanding of search intent,to improve the searchability Effectiveness of technology.To verify the effectiveness of code search technique,this thesis selects 10 popular open-source projects from Git Hub as experimental objects.These open-source projects involve data storage,development framework,IO operation,and other fields.The total number of codes is about 29 million lines.In the evaluation experiment,the accuracy and the mean reciprocal rank are used as the measurement indexes to compare the search effect of MLCS and WordNet-based code search on the same data set,and the search execution time of the two techniques is compared.Experimental data show that the accuracy of MLCS is 30% higher than that of WordNet-based code search,and the mean reciprocal rank is 22% higher than that of the Top-10.At the same time,MLCS keeps a linear growth trend similar to WordNet-based code search in execution time.The code search technique proposed in this thesis achieves high search accuracy based on multidimensional labelling information and can help developers to search the expected code snippets.
Keywords/Search Tags:Code Search, Code Labelling, Query Analysis, Multidimensional Labelling Information
PDF Full Text Request
Related items