| Early researchers matched search sentences with textual content and structural information in the code for code search purposes.Considering the mismatch between the high-level query intent reflected in the natural language description and the low-level implementation of the source code in the code base,in recent years,some people have proposed to map the code function description and code content to the same high level through a deep neural network model.dimensional vector space for matching.Existing approaches,however,typically target a single search,assuming that developers can accurately describe the functionality of the code and thus search for exact code matches.As software systems grew in size and complexity,when developers face new development requirements,it is difficult to give a more accurate description of code functions at one time,so that multiple code searches are required to improve the description of code functions.At the same time,search for more matching code snippets.This paper has done the following work to solve the above problems:(1)The code search dataset query2 code was collected and constructed from the GitHub open source code repository,and used for experiments.The attributes contained in it include code comments(ie,code function descriptions),code snippets,function names,and API call sequences.(2)A two-stage code search method is proposed.The method uses the query2 code dataset as the basic data,and divides the code search into two stages.The first stage takes the description of the code function as the input and outputs the code with high accuracy.Snippets and meta information assist users in improving code function descriptions.In the second stage,code function descriptions,function names,and API call sequences are used as inputs to search for more precisely matching code snippets.(3)According to the characteristics of different search stages,the mainstream model is improved,and a new code search model based on deep learning technology is proposed.The first stage of code search selects CodeSearchNet with a simpler structure,and changes the encoder from RNN,Bag Of Word to a bidirectional LSTM structure.The second-stage code search is based on the complex structure of deep CS,encoding multi-dimensional code features,and adding an Attention mechanism to more fully mine the dependencies of code multi-dimensional features.In addition,the Embedding layers of the two models are initialized using the code Bert pre-training vector to introduce external semantic information.(4)Comparing with the mainstream model on the query2 code dataset collected in this paper,the results show that the cd Search1 model has a 5.79% improvement in the accuracy(Accuracy)index compared to CodeSearchNet.Compared with deep CS,the cd Search2 model achieves 2.6% and 2.4% improvement in mean accuracy(MAP)and normalized impairment cumulative gain(NDCG),respectively,which verifies the effectiveness of the model proposed in this paper.(5)Based on the two-stage search method,cd Search1 and cd Search2 model,combined with web technology,a code search system is designed and implemented,aiming to provide more effective code search services for developers. |