Font Size: a A A

Design And Implementation Of Text Resource Sharing System Based On Keyphrase Extraction

Posted on:2023-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2558306914480284Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet digital age,the number of text resources on the Internet is increasing sharply every day.At the same time,it is very difficult for users to quickly retrieve and utilize unstructured texts.How to help users to quickly Understanding important information in documents while helping users manage and share textual resources becomes an important issue.By extracting keyphrases from the content of the document resource,it can quickly help users identify whether the resource is of interest.Therefore,this thesis designs and implements a text resource sharing system based on keyphrase extraction.Users can upload and download the resources they need through the system to achieve text resource sharing among users.Users do not need to input complicated information on documents,and the system automatically completes the analysis of the text,through keyphrase extraction,summarizes the content of the document,and finally supports users to find the resources they are interested in by retrieving keyphrases.This thesis mainly completes the following three aspects:1.According to application scenarios,an unsupervised keyphrase extraction algorithm BWRank based on text pre-training language model is proposed.BWRank uses Bert to get the representation of text vectorization.Then the text vector is transformed into a new vector space constructed by external corpus by whitening operation,so that the extraction accuracy is improved and the vector dimension is greatly reduced.Experiments were conducted on two public datasets,Inspec and SemEval2017,and the accuracy rates of BWRank in Top5 keyword extraction reached 44.3%and 47.18%,respectively.Experimental results show that BWRank has a lower vector dimension and a better keyphrase extraction accuracy than other algorithms,which proves the effectiveness of the proposed algorithm.2.This paper analyzes the requirements of text resource sharing system,including functional requirements and non-functional requirements.The system architecture is designed and the functional modules are divided in detail according to the requirement analysis.The role and function of the system user are analyzed,and the entity relation diagram is drawn.According to the entity relation diagram,the structure of the database table is designed.It lays a foundation for the development and implementation of the follow-up system.3.The text resource sharing system is designed and developed in detail.This thesis pounds the application scenarios and functional flow of each module in detail.Keyphrase extraction module,document management module,user management module,document retrieval module and data log visualization module combined with sequence diagram and flow chart to explain the implementation details.Finally,all modules in the system were tested,and functional tests and unit tests were carried out for the main functions.The overall operation is stable and can meet the functional needs of users.The system can help users manage text resources better.It has certain application value and research value.
Keywords/Search Tags:keyphrase extraction, pre-trained language model, text resource, word vector
PDF Full Text Request
Related items