Font Size: a A A

Research On Mining API Documentation

Posted on:2019-11-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X ZhangFull Text:PDF
GTID:1368330545469071Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of software reuse techniques,developers tend to use third-party libraries to implement certain functionalities and services.By invoking the Application Programming Interfaces(APIs)in the third-party libraries,developers can save the development time and improve the efficiency of software development.However,APIs are difficult to learn and use.Facing an unfamiliar API,developers often seek for various API documentation to learn its correct usages.Hence,the quality of API documentation is critical to the efficiency of API learning and usages,and even the efficiency of software development.In recent years,it becomes a hot research topic for researchers to investigate how to resolve the problems of API documentation and further improve the efficiency of API learning and usages.Although researchers have proposed some algorithms to solve the problems of API documentation,these algorithms do not fully leverage the domain-specific knowledge of API documentation,and the results of the algorithms need to be further improved.This dissertation focuses on three types of API documentation that produced by developers when learning and using APIs,i.e.,API tutorials,API related technical Question and Answer(Q&A)pairs,and API related bug reports.To resolve the problem that it is hard for developers to understand the lengthy API tutorials,a more accurate supervised approach and an unsupervised approach are proposed to recommend API tutorial fragments explaining APIs for developers.To resolve the problem that it is difficult to answer API related technical questions,an approach based on API specification and historical information is proposed to help developers locate the correct APIs.To resolve the problem that API induced bugs are hard to fix,an approach leveraging authorship characteristics of contributors is proposed to construct summaries for bug reports,thus accelerating the understanding of the fix process of API related bugs.Specifically,this dissertation makes the following contributions.(1)Algorithms design for a supervised approach and an unsupervised approach of recommending API tutorial fragments.The lengthy of API tutorials prolongs the time for developers to learn API usages.To resolve the problem of the lengthy of API tutorials,a supervised approach of recommending API tutorial fragments explaining APIs is proposed.This approach first segments API tutorials into fragments and further recommends relevant API tutorial fragments to developers.It digs more about the internal relations in API tutorials and proposes two important types of features,i.e.,co-occurrence API features and extended API features.This approach can accurately recommend API tutorial fragments to developers.Meanwhile,to improve the practicability of the algorithm,an unsupervised approach of recommending API tutorial fragments is also proposed.Based on the observations on a large-scale of API tutorial fragments,we find that not all the API tutorial fragments are aimed at explaining APIs.The unsupervised approach proposes a set of heuristic rules to detect non-explanatory API tutorial fragments,and it fully leverages the PageRank algorithm and topic model to analyze and recommend API tutorial fragments from the perspectives of lexical and semantic respectively.The unsupervised approach achieves the best results among the existing methods and can be applied to practical scenarios.(2)Algorithm design for API recommendation based on API specification and historical information.Thousands of APIs are encapsulated in API libraries,making it difficult for developers to locate the correct APIs.Typical technical Q&A websites,such as Stack Overflow,assemble millions of API related questions,whose resolution time is longer than that of other questions.To reduce the waiting time for developers,a novel approach leveraging API specification and historical information is proposed to recommend APIs for API related questions.The approach can achieve better results than the other approaches,and it can help developers save the time to locate and learn APIs.(3)Algorithm design for API related bug report summarization based on contributors'authorship characteristics.API related bugs are difficult to fix.When fixing API related bugs,developers usually refer to the fix process of resolved bug reports.Hence,accurate bug report summary can save the time for developers to read and understand the bug report,and further accelerate the fix for API related bugs.Previous studies only take the contents of bug reports into account without considering contributors' factors.We propose a novel approach to model authorship characteristics for contributors and conduct an empirical study on the authorship characteristics of typical contributors in open source communities.The proposed approach leverages authorship characteristics of contributors to reduce the training set.It can not only reduce the training time,but also improve the accuracy of summarizing bug reports.This dissertation focuses on analyzing and mining typical API documentation produced in the process of API learning and usages.Based on the existing approaches with their shortcomings,we folly leverage the domain-specific knowledge to design the solutions.The research methods in this dissertation can be further extended to the other research areas in software engineering.
Keywords/Search Tags:Application Programming Interface, Data Mining, API Tutorial, Technical Q&A, Bug Report
PDF Full Text Request
Related items