Font Size: a A A

Research On Code-Query Matching Method For Software Q&A Communities

Posted on:2021-04-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:G HuFull Text:PDF
GTID:1488306461964929Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,with the vigorous development of open source software and open source communities,many types of online question-and-answer(Q&A)platforms have been born on the Internet for software developers to use.Software developers use these online platforms to share open source projects,search code solutions,answer development doubts,share development experience,learn development techniques,and more to further improve the quality of project development and enhance their own level of development.Today,these platforms contain millions of open source projects and hundreds of millions of lines of source code and contribute a large number of code solutions,which play a critical role in the conceptual understanding,code reusability,and more in software development activities.This thesis refers to this type of online platform that provides developers with software information queries as software Q&A communities,and the relevant queries(usually natural language descriptions)and replied content(often code examples that contain contextual natural language descriptions)published in these sites as software Q&A objects.However,as the software Q&A communities continue to evolve,so does the types of software codes and the number of software objects they cover.Software Q&A community as a social platform,the software Q&A objects contained in it usually have short text,insufficient specifications,high complexity,more noise,and strong dynamics,which makes it difficult to find effective code examples in software Q&A communities.To address these challenges,this thesis builds a unified framework for code-query matching in software Q&A communities,providing a solution for finding the most matching code examples in software Q&A communities.In detail,there are following four parts of the work in this thesis:1)Mining high-quality software repositories in the software community can provide effective extension sources for query statement expansion.Aiming at the problems of lack of universality of software Q&A corpus and excessive query expansion,combined with unsupervised software repositories mining Code MF algorithm,this thesis proposes a codequery matching model named CFCQ based on crowd-knowledge expansion query.2)The pre-training structure word embeddings learned from external noisy data helps to introduce prior knowledge of structure semantics.Aiming at the problem that the model lacks the introduction of external pre-learned structural semantic knowledge,combined with structural embedding and dual attention mechanisms,this thesis proposes a code-query matching model named NACQ that utilizes the structure embeddings and joint attention.3)Focusing on the linguistic hierarchy of software Q&A objects can help to mine the interactive information between the query and the code.Aiming at the problem of the lack of integrated hierarchical structure in the semantic representation of software Q&A objects,combined with ONLSTM and attentive pooling,this thesis proposes a code-query matching model named SNCQ that combines the attentive pooling with ordered embedding.4)Hierarchical embedding can retain more structure semantic information and reduce the information loss by RNN direct modeling.To address the problem of semantic loss arising from RNN directly embedding software Q&A objects,combined with self-attention mechanism and attentive pooling operation,this thesis proposes a code-query matching model named LGCQ that using local to global hierarchical embedding.This thesis takes the mainstream software Q&A community,StackOverflow site,as an example,and collects the datasets of four software Q&A objects,C#,SQL,Python,and Java,that contain declarative and interpretive language.The validity and accuracy of the models proposed in this thesis are demonstrated through detailed experimental demonstration and conclusion analysis.
Keywords/Search Tags:software Q&A sites, StackOverflow, code-query matching, attention mechanism
PDF Full Text Request
Related items