Font Size: a A A

Technical Feature Oriented Code Retrieval And Analysis System

Posted on:2023-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:X D NingFull Text:PDF
GTID:2568306614984439Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of open-source code platforms,people are accustomed to sharing and exchanging deep learning models and code on those platforms.Meanwhile,deep learning has succeeded in natural language processing,computer vision,biocomputing,and other scientific research fields.It is necessary for researchers to find solutions when solving a given.problem and compare the performance of reusable code.However,for code retrieval and comparison,existing systems pay little attention to technical features such as specialized vocabulary,function call structures,etc.Therefore,we improved the keyword extraction method of the general text to incorporate better and reflect the specialized vocabulary of technical features.We propose a semantic encoding method based on function call structure and implement a code retrieval and analysis system.The main work is as follows:(1)We propose a keyword extraction model incorporating semantic and syntactic information.When people retrieve code in the open-source code platform,keyword-based code retrieval should not only consider the relevance of query keywords and the code description text but also the specificity of the technical feature keywords.However,existing keyword extraction methods are generally not suitable for specialized texts which contain extensive technical words.We propose a keyword extraction model that incorporates semantic,syntactic,and lexical specificity to tackle this problem.The pre-trained model BERT is used as a text encoder to extract abstract semantic information from text.To capture long-range semantic dependencies between words while taking into account the importance and specificity of keywords,a fusion analysis method of lexical TFIDF values and sentence dependency syntactic knowledge is used to construct a semantic association graph by combining co-occurrence vocabulary and dependency relations,and model uses a random walk algorithm to calculate lexical weights.The experimental results show that the keywords extracted by this model can better reflect the technical features of the code.In the keyword specificity analysis,fusing TFIDF values and syntax improves the keyword specificity.The ablation experiments validate the benefits of dependent syntactic knowledge on the model performance.(2)We propose a code comparison analysis model based on function call structure.Modern algorithms often contain many basic library functions,and function names,call structures and built-in classes provide essential technical features of the code.To synthesize the semantic information of technical features,we propose a semantic encoding method based on function call structure.It analyzes code similarity in function call structure,function names and built-in classes.First,it uses a self-encoder method based on graph convolutional neural networks for graph semantic encoding and compares code structure semantics based on semantic vectors.Second,it compares function similarity and built-in class similarity based on library function and built-in class call information vectors.Finally,it splices structure vectors,function vectors and built-in class vectors as overall technical feature vectors and compares code similarity.The experimental results show that the model can better compare codes based on technical features.The ablation experiments validate the benefits of function call structure on the model performance.The model is capable of efficiently extracting relevant and distinctive information between codes in the function call structure semantic visualisation analysis.(3)We designed and implemented a technical feature-oriented code retrieval and analysis system.Based on the code storage,code running environment and hardware resources provided by the High-Performance Computing Cloud Platform of Shandong University,we implemented the code retrieval and comparison analysis function.System based on code description text or technical features keywords for code retrieval;The code analysis function compares the retrieved reusable code with the newly designed code to further analyse code performance.We have tested the code retrieval and comparative analysis functions to verify the feasibility and effectiveness of the system.The syntactic analysis algorithm case shows that the system can implement code retrieval and comparative analysis based on user requirements.
Keywords/Search Tags:Technical Features, Keyword Extraction, Comparison Analysis, System
PDF Full Text Request
Related items