Font Size: a A A

Research And Application Of Structured Document Processing Technology

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:W X LiuFull Text:PDF
GTID:2558306914978789Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
REST(Representational State Transfer)is a software architecture style that has attracted wide attention due to its simplicity,low coupling,and high scalability.Now more and more manufacturers provide services to users by opening REST API(Application Programming Interface).Users learn how to use APIs by reading API documentation on the Internet.With the rapid development of REST API,users’ demand for REST API documentation has become more prominent.However,most Web service providers only provide REST API documentation on Web pages.There are two main characteristics for a large number of REST API documentation on Web pages:1.There are many redundant information on Web pages and the distribution of REST API information is scattered.2.The REST API documentation of multiple sites do not have a unified description structure,and the documentation formats vary a lot.This makes it difficult for users to make effective use of REST API documentation.In view of this situation,there is an urgent need to automatically extract REST API information on Web pages and describe it in a unified specification.OpenAPI is currently the most widely used REST API specification,the REST API is described in a machine-readable JSON(JavaScript Object Notation)or YAML format,and it is convenient to further test REST API functions and generate corresponding SDK(Software Development Kit)code.The key to REST API documentation format conversion is to extract API information that conforms to the OpenAPI specification from Web pages.The traditional REST API documentation format conversion ways are mainly completed by manually writing rules to extract REST API information,which is overly dependent on professional domain knowledge and has poor portability.This thesis proposes a deep learning-based REST API documentation processing method by referring to relevant technology in the field of natural language processing,and solves the problem of REST API documentation format conversion through the ideas of text classification and machine reading comprehension separately.The main tasks include:1.In response to the lack of REST API text corpus,this thesis crawls REST API documentation from multiple sites and build two datasets of REST API documentation.One is for the text classification system and another is for the machine reading comprehension system.2.This thesis proposes a REST API documentation conversion system based on text classification.With the help of the constructed REST API text classification dataset,the validities of the documentation conversion system based on HRNN and BERT are verified,and the performance of the system is analyzed through the comparative experiment.3.This thesis innovatively solves the problem of REST API documentation format conversion through machine reading comprehension technology.At the same time,based on the constructed REST API machine reading comprehension dataset,the validity of the REST API documentation conversion system based on machine reading comprehension is verified.The experimental results show that the REST API documentation conversion system based on machine reading comprehension can accurately identify the REST API information conforming to OpenAPI specification and achieve good results on the same site and multiple sites.
Keywords/Search Tags:API documentation, OpenAPI specification, documentation format conversion, text classification, machine reading comprehension
PDF Full Text Request
Related items