Font Size: a A A

Automatic Mining And Retrieval Of High Quality Code Base For Programmers' Q&A Forums

Posted on:2020-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:X J YinFull Text:PDF
GTID:2428330590974467Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of software technology,the scale of software is getting larger and larger,and the knowledge fields involved in software are also increasing.How to effectively improve the efficiency of software development has become an important issue that software developers must face.Developers such as Stack Overflow have a large number of natural language descriptions and their code solutions on the online Q&A forum.Understanding and reusing these code snippets can greatly improve software development efficiency.The first key issue in reusing these snippets is how to mine high-quality code snippets from a large number of problem answers and build high-quality code repositories.The so-called high-quality code snippets are code snippets that can solve the corresponding problem independently.Another key issue is the need to combine natural language processing and question-and-answer technology to achieve fast and accurate searches of high-quality code repositories,enabling developers to accurately search for code solutions for related problems in the code repository.Therefore,there are two core problems in building a high-quality code warehouse and code question answering system,namely the mining of high-quality code question and answer pairs and the retrieval of similar question and answer pairs.However,manually marking high-quality questions and answers is a time-consuming and laborious task;on the other hand,traditional textbased matching retrieval methods are difficult to meet the semantic search requirements of similar problems.In order to solve the above problems,this paper mainly completed the following work.Firstly,a high quality code base mining method based on deep learning is studied.By collecting the answers to the questions in the Python language on the online question and answer forum Stack Overflow,and extracting the "How to do it" type of questions and their accepted acceptable answers(Accepted Answer)as training data,training a twoway GRU based A hierarchical neural network model that uses the model to identify highquality code snippets in forum posts and build a code repository of high-quality Q&A pairs.Secondly,the search method of high quality code warehouse is studied.The method of calculating the similarity degree of the sentence matching level and the sentence matching level and the similarity calculation method based on the question representation at the semantic level are analyzed and compared.In order to realize the quick retrieval of the question and answer pair,the relevant information required for the similarity calculation of the question is stored as auxiliary search information in the code warehouse to speed up the calculation of the question similarity.The searched code segment is sorted and returned to the user by calculating the similarity between the user-entered question and the question information stored in the library.Finally,a high-quality code automatic question answering system for the online question and answer forum Stack Overflow was designed and implemented,and the system was tested.
Keywords/Search Tags:Stack Overflow, Bidirectional GRU, question similarity calculation, siamese network, RNN
PDF Full Text Request
Related items