Font Size: a A A

Automated Query Reformulation Approach For Document Search In Software Engineering

Posted on:2022-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:K B CaoFull Text:PDF
GTID:2518306725983809Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Industrial Internet,the software and internet industry has seen a new growth in recent years.The development of industry has accelerated the iteration of knowledge and skills in software engineering.As a result,developers have to master a large number of programming skills,and when encountering programming-related questions,they tend to search for solutions on programming Q&A sites(e.g.,Stack Overflow)with search engines.Different from queries in the general field,queries in the field of software engineering are highly specialized,that is,queries and documents contain software terms and symbols,making it difficult for developers to efficiently locate the information they want.For this reason,developers have to reformulate their queries by adding restrictions on programming languages or platforms,removing specific information from queries,etc.Query reformulation is difficult for novice developers and time-consuming for experienced developers.In this thesis,we want to propose an automated query reformulation approach that helps developers to perform query reformulation efficiently and accurately,and then improve the query efficiency and the quality of query results.We first conduct an empirical study based on users' activity logs on Stack Overflow.The results show that query reformulations in the field of software engineering have many patterns unique to the field,but in most cases,users will not introduce significant modifications during the reformulation.Based on the findings,we conjecture that it is both time-consuming and error-prone to implement query that do not involve significant modifications can be modeled by deep learning.Therefore,we propose a deep learning-based model for query reformulation in the field of software engineering,which,different from previous research works,adopts an end-to-end structure and is trained on a large-scale corpus of real-world query reformulation logs.In addition,the model is optimized for the software engineering domain to better model the queries' semantic features.In this thesis,we evaluate the performance of our approach in terms of the difference from the user's manual reformulation and the retrieval effectiveness of the reformulated query.Comparing with five state-of-the-art baselines,the reformulation results given by our approach is the closest to the manual reformulation,and achieves a 5.6% to 33.5% boost in terms of Exact Match,and has higher retrieval effectiveness,achieves a 129.33% boost in terms of MRR compared to the original query.Meanwhile,the results of the user study show that 85.7% of the users agree that the query reformulations given by our approach fully meet or exceed their expectations,and the remaining 14.3% of the users also agree that the reformulation results basically meet their expectations.Lastly,in order to facilitate the use of our query reformulation model,we design and implement a browser plug-in.The browser plug-in can provide several reformulation suggestions for user's queries in the field of software engineering,the results of user study from 35 developers confirm the effectiveness of our approach.
Keywords/Search Tags:Query Reformulation, Information Retrieval, Stack Overflow Mining, Query Log, Deep Learning
PDF Full Text Request
Related items