Research On Text-oriented Code Search

Posted on:2022-08-08

Degree:Master

Type:Thesis

Country:China

Candidate:J H Shuai

Full Text:PDF

GTID:2518306536474044

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of the Internet industry and the rise of open source software and open source communities,a large number of code repositories have appeared on the Internet,especially some frequently accessed online search sites,such as Github,etc.,which contain a large number of them for development Code and project resources reused by personnel.When developers need to complete a programming task but do not know which functions to call,code search and reuse becomes a shortcut to solve the above problems and improve the efficiency of software development.How to effectively help developers search for codes related to specified programming tasks from candidate code libraries has become one of the important research areas of software engineering.The early code search model regards code as text words,and uses information retrieval technology to match natural language queries with candidate code text execution keywords to find the target code.For this type of model,only when there are a certain number of common keywords between the query and the code,the matching code is more likely to be searched.Because the query and code are heterogeneous languages,the search effect is often not ideal.The joint embedding model that has appeared in recent years,although the vector similarity replaces the keyword matching,it ignores the semantic relationship between the code and the query.In order to explore the semantic relationship between the two,this paper introduces the common attention mechanism into the code search model.At the same time,based on the common attention mechanism,a two-stage attention mechanism is implemented for code and query,and the search performance of the model is further improved by improving the efficiency and quality of representation learning.The main work of this paper is as follows:(1)It expounds the research background and significance of code search,introduces its research progress,analyzes the limitations of existing code search models,and proposes algorithm ideas to improve search performance.(2)A code search model CARLCS-CNN based on the common attention mechanism is proposed.Convolutional neural networks and long and short-term memory networks are used to realize the embedded representation of codes and queries,and the common attention mechanism is used to sense semantics of codes and queries.This model alleviates the problem of keyword mismatch to a certain extent.(3)Based on the common attention mechanism,a code search model Tab CS based on the two-stage attention mechanism is proposed.This model takes the two-stage attention mechanism as the main body,instead of the traditional deep learning embedding structure.Use the attention mechanism to filter out the feature words,filter semantic keywords,and accurately perform semantic mining and interaction on codes and queries with the help of two-stage weight distribution.(4)A large number of experiments were carried out on CARLCS-CNN and Tab CS on three large-scale open source data sets of Deep CS,Deep COM,and Code Search Net.The analysis of the experimental results proved the effectiveness of the common attention mechanism and the two-stage attention mechanism mentioned in this article.

Keywords/Search Tags:

Code Search, Deep Learning, Attention Mechanism, Representation Learning

PDF Full Text Request

Related items

1	A Research On Knowledge Representation Learning Of Joint Text Based On Deep Learning
2	Few-shot Image Classification Method Based On Deep Learning
3	Research And Implementation Of Code Smells Detection Based On Deep Learning
4	Research Of Person Search Based On Deep Learning
5	Research On Hyperparameter Optimization Method Of Emotional Computing Model Based On Machine Learning
6	Research On Deep Learning Method Based On Word Vector Representation In Text Classification
7	Joint Representation Learning With Heterogeneous Data For Personalized Recommendation
8	Research On Detection Method Of Phishing Web Page Based On Deep Learning
9	Research On Malicious Code Detection Method Based On Deep Learning
10	Deep Representation Learning For Sarcasm Detection In Twitter Using Attention Mechanism