Font Size: a A A

Research And Implementation Of Keyword Extracion For Work Report

Posted on:2023-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2558306914473344Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of technology and smart devices,the amount of information generated is exploding.How to obtain meaningful content from a large amount of text quickly and accurately has become a technical hotspot for research in recent years.Keyword extraction,as a precursor task of many natural language processing tasks,has a direct impact on the results of search,recommendation,classification and other tasks.In this paper,we take a company’s work report system as the background to study keyword extraction techniques under a specific domain in order to achieve keyword extraction for work reports,improve the efficiency of corporate review,and play an active role in storing,retrieving,and classifying data.The main tasks accomplished in this paper are:In this study,we take Chinese text as the research object and propose a multi-feature fusion keyword extraction algorithm combined with BERT semantics and K-Truss graph(BSKT).The BSKT algorithm is based on the TextRank algorithm,which combines BERT semantic features,K-Truss features,and other features.Firstly,the BSKT algorithm obtains the word vector from the BERT pretraining model to calculate the semantic difference,which is used to optimize the iterative process of the TextRank word graph.Then,the BSKT algorithm obtains its K-Truss graph by decomposing the TextRank word graph and obtains the truss level feature of the word.Finally,by combining the word’s IDF and truss level features,the BSKT algorithm scores the words to extract keywords.Experimental results show that the BSKT algorithm obtain better performance than the latest keyword extraction algorithm SCTR in the task of extracting 1-10 keywords.Furthermore,the increment in the F1 value increased by 11.2%when BSKT algorithm was used to extract three keywords.Secondly,a professional dictionary in the context of an enterprise work report project was constructed by manually cleaning and screening a large amount of real data,which contains more than 20 categories and more than 10,000 words data,each data contains four attributes of words,word nature,word frequency and kind.The construction of the professional dictionary has greatly improved the accuracy of word separation of a certain enterprise work report text,and the accuracy of word separation has increased from 87.9%to 95.5%.Finally,based on the improved algorithm,this paper designs and develops a keyword extraction system for work reports by combining other Chinese information processing technologies and constructed professional domain dictionaries,and implements various functions such as data annotation,dictionary construction and keyword extraction.The system is easy to operate and easy to use,and its functions and stability have been verified by several tests,which has certain practical value and significance.
Keywords/Search Tags:Keyword extraction, BERT word vector, K-Truss graph, TextRank, Multi-feature fusion
PDF Full Text Request
Related items