Font Size: a A A

Research On Text-oriented Code Search

Posted on:2022-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:J H ShuaiFull Text:PDF
GTID:2518306536474044Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet industry and the rise of open source software and open source communities,a large number of code repositories have appeared on the Internet,especially some frequently accessed online search sites,such as Github,etc.,which contain a large number of them for development Code and project resources reused by personnel.When developers need to complete a programming task but do not know which functions to call,code search and reuse becomes a shortcut to solve the above problems and improve the efficiency of software development.How to effectively help developers search for codes related to specified programming tasks from candidate code libraries has become one of the important research areas of software engineering.The early code search model regards code as text words,and uses information retrieval technology to match natural language queries with candidate code text execution keywords to find the target code.For this type of model,only when there are a certain number of common keywords between the query and the code,the matching code is more likely to be searched.Because the query and code are heterogeneous languages,the search effect is often not ideal.The joint embedding model that has appeared in recent years,although the vector similarity replaces the keyword matching,it ignores the semantic relationship between the code and the query.In order to explore the semantic relationship between the two,this paper introduces the common attention mechanism into the code search model.At the same time,based on the common attention mechanism,a two-stage attention mechanism is implemented for code and query,and the search performance of the model is further improved by improving the efficiency and quality of representation learning.The main work of this paper is as follows:(1)It expounds the research background and significance of code search,introduces its research progress,analyzes the limitations of existing code search models,and proposes algorithm ideas to improve search performance.(2)A code search model CARLCS-CNN based on the common attention mechanism is proposed.Convolutional neural networks and long and short-term memory networks are used to realize the embedded representation of codes and queries,and the common attention mechanism is used to sense semantics of codes and queries.This model alleviates the problem of keyword mismatch to a certain extent.(3)Based on the common attention mechanism,a code search model Tab CS based on the two-stage attention mechanism is proposed.This model takes the two-stage attention mechanism as the main body,instead of the traditional deep learning embedding structure.Use the attention mechanism to filter out the feature words,filter semantic keywords,and accurately perform semantic mining and interaction on codes and queries with the help of two-stage weight distribution.(4)A large number of experiments were carried out on CARLCS-CNN and Tab CS on three large-scale open source data sets of Deep CS,Deep COM,and Code Search Net.The analysis of the experimental results proved the effectiveness of the common attention mechanism and the two-stage attention mechanism mentioned in this article.
Keywords/Search Tags:Code Search, Deep Learning, Attention Mechanism, Representation Learning
PDF Full Text Request
Related items