Font Size: a A A

Research On Code Retrieval Technology Based On Extended Query And Natural Language Processing

Posted on:2022-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:G C WangFull Text:PDF
GTID:2518306557968359Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The existing code retrieval research focuses on the combination of neural network model and community Q&A,and the joint modeling of user query and code fragment,but the research on the combination of neural network model and pull request(PR)information in code warehouse is less.Code feature extraction technology usually combines neural network model with community Q&A and open source code data preprocessing to jointly model user query and code fragment.However,the model highly relies on fine annotation data set,and the training effect of large-scale coarse annotation data set is general.Based on the above problems,this paper proposes a code retrieval technology based on extended query and natural language processing,combines Porter stemming algorithm to process PR information and express it stematically,uses cbow model to learn the text,and uses word vector to express the semantic information of PR description information and diff code,so that the cosine distance of similar PR information is close,so as to expand the scope Semantic similar natural language query can improve the accuracy of natural language retrieval code examples.Then,a kind of best pre training model is proposed,which combines programming language with natural language.By masking language modeling,replacing token detection and fine-tuning,the complex features of PR information,such as syntax and semantics,are learned to reduce the dependence on fine annotated data sets.Then,we select the average number of open source code,gib and the reciprocal 5 as the research objects,and we evaluate the average number of related projects.The results show that the average accuracy rate of the top 10 documents can reach 58.17%,and the average reciprocal ranking can reach 61.11%.That is to say,there are six code examples related to the query in the first page of the search results,and the first correct search result usually appears in the first or second position.Therefore,the combination of neural network model and PR information in the code warehouse can improve the efficiency to a certain extent Improve the accuracy of code retrieval.Finally,in order to verify the effectiveness of the two code retrieval methods from a practical point of view,based on the above research results,this paper develops a code retrieval system with certain usability on the basis of the code retrieval research framework.
Keywords/Search Tags:code retrieval, information retrieval, BERT, natural language processing
PDF Full Text Request
Related items