Font Size: a A A

Research On Web System Of Extracting Key Content Of Patent Web Page

Posted on:2021-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q FanFull Text:PDF
GTID:2518306308463114Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet towards high speed,intelligence and globalization,the number of Internet users in China is increasing.According to data released CNNIC(China Internet Network Center),from 1997 to 2018,the number of Internet users in China increased from 620,000 to 829 million,and the number of websites increased from the initial 1500 to 5.23 million.More and more people use web pages for social communication,creative sharing and knowledge acquisition,and Internet applications have been closely related to human work,life and other social activities.Internet applications have the characteristics of wide radiation and strong interactivity,which can communicate with the recipients of information in real time and provide great convenience for the staff in various fields.Therefore,many users and companies with industry background put forward some application requirements for web page retrieval to improve efficiency,including:reducing the workload of manual retrieval,shortening the time of manual retrieval,extracting and displaying the key information of the target page.Based on the study of the Web framework,this paper designs and builds a patent subsystem for extracting patent description information,patent keywords and abstracts,as well as a generic subsystem for extracting key information,such as general web keywords and abstracts.The work of this paper is divided into three parts:the first part is the study of the theory of web page extraction,including the methods of web page text extraction,and the methods of text keyword,abstract extraction.The second part is the implementation of the key technology of the system,including the extraction of web text based on Readability technology,and the extraction of text keywords and abstract based on the improved TextRank algorithm.The third part is the design and implementation of the Web system to extract the key information of patent web page,including the design and implementation of the acquisition of relevant patent information,and patent key information extraction.The platform uses Flask to build back-end service module,realizes Web front-end interaction through Vue framework,Jquery,stores patent data and user information by Mysql relational database,and provides services of patent primary selection and key information extraction of web pages.At present,the project has been deployed on a lab server,which can be used to help patent researchers to view and select patents,and can also provide online web page body,document/text summary generation service in real time.Practice has proved that the Web system designed and implemented in this paper to extract the key information of patent web pages has a good application prospect.
Keywords/Search Tags:patent view, algorithm extraction, Readability, TextRank, Flask
PDF Full Text Request
Related items