Font Size: a A A

Research On Keyword Extraction Method Based On Semantics Features

Posted on:2022-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:N SuFull Text:PDF
GTID:2518306575967099Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Today's Internet environment makes our life very convenient.These cannot be separated from the rapid development of the Internet,especially the rapid development of mobile Internet in recent years.With the progress of technology,the speed and scale of information generation and dissemination have reached an unprecedented level.Massive data comes to us,among which text data occupies a large part.It is very important to quickly acquire the main content information of these text data.Keywords are one of the main ways to quickly obtain text information.It is easy to understand and can cover the theme information of text well is an important feature of key words.Through a few short keywords,people can quickly understand the main content of a text,but it is not only time-consuming and hard to extract keywords by manual way,but also cannot deal with massive data.Automatic keyword technology can solve this problem well,and it is one of the important ways to deal with these massive data.Traditional keyword extraction methods only rely on statistical information to extract keywords,ignoring the important feature of semantic information in text,and cannot cover the subject information of documents.Based on this,this thesis studies the keyword extraction algorithm,and combines the statistical information,the deep learning semantic information and the topic information to design and implement the algorithm.The main work of this thesis is as follows:1.in view of the semantic deficiency of traditional keyword extraction methods,the thesis uses the pre training language model based on deep learning to obtain the vector representation of text as an important semantic information.Combined with statistical information,an automatic key word extraction algorithm combining semantic features is proposed.Through a lot of experiments,the algorithm has achieved good results.2.in view of the problem that only relying on statistical information to extract keywords is not enough to completely cover the subject content of the target document,the topic model knowledge is introduced based on the combination of semantic information and statistical information,and a keyword extraction method combining semantic features with theme model is proposed.3.in order to verify the practical application value and effect of the algorithm proposed in this thesis,the algorithm proposed in this thesis is the core algorithm of the system.From the perspective of software engineering,a prototype system of keyword extraction is designed and implemented.
Keywords/Search Tags:Keyphrase extraction, Pre-trained language model, Word vector, Topic model
PDF Full Text Request
Related items