Font Size: a A A

Research On Automatic Keyphrase Generation Algorithm Based On Enhancing Semantic Consistency

Posted on:2023-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:H S FangFull Text:PDF
GTID:2568307061953819Subject:Computer Science and Technology
Abstract/Summary:
Keyphrases are highly generalized as text topics,usually covering the most core information in the text,and reading keyphrases will help you quickly get the core essence of the text.The keyphrase automatic generation algorithm aims to automatically generate keyphrases expressing the core semantics of the text from the text,and it provides a technical basis for understanding the semantics of the text as a commonly used text mining tool.However,the current automatic keyphrase generation algorithm still has the following problems: first,the current automatic keyphrase generation algorithm often ignores the correlation between cognate words,resulting in difficulty in generating accurate cognate keyphrases in the process of word transformation;second,the current corpus-based keyphrase automatic generation algorithm cannot fully excavate the co-occurrence relationship and synonymous relationship between words and words,resulting in difficulty in generating accurate related keyphrases in the process of synonym substitution and text summary.Aiming at the above problems,this thesis models the correlation between words to enhance the semantic consistency of generated keyphrases and real keyphrases,and deeply studies the automatic keyphrase generation algorithm that enhances semantic consistency.Firstly,an automatic cognate keyphrases generation algorithm based on stem information fusion(ACKGSIF)is proposed,which enhances the semantic consistency of generating cognate keyphrases and real cognate keyphrases in the process of insequation transformation by modeling the correlation between cognate words,and improves the accuracy of generating cognate keyphrases.Then,on the basis of ACKGSIF algorithm,an automatic keyphrase generation algorithm based on heterogeneous association subgraph(AKGHAS)is proposed,which improves the semantic consistency of generating keyphrases and real keyphrases in the process of synonym substitution and text summary by modeling the co-occurrence relationship and synonym relationship between words and words,and improving the accuracy of the generated keyphrases.On this basis,an automatic keyphrase generation prototyping system with enhanced semantic consistency was designed and implemented.The main work of this article is as follows:(1)In order to solve the problem that the current automatic keyphrase generation algorithm is difficult to generate accurate cognate keyphrases in the process of word shape transformation,this thesis proposes an automatic cognate keyphrase generation algorithm(ACKGSIF)based on stem information fusion.By fusing stem features in the word embedding layer,the algorithm explicitly extracts the correlation between cognate words,and integrates the source text features and the reference text features based on stems in the decoder module,increasing the solution space for generating cognate words based on stems,so that the semantics of generated cognate keyphrases and the semantics of real cognatenkeyphrases are further consistent;and then the joint stem sequence generation task under the multi-task framework jointly trains the algorithm model,driving the model to optimize in the direction of generating correct cognate keyphrases.(2)On the basis of inproving the accuracy of gernerated cognate keyphrases,aiming at the problem that the current corpus-based keyphrase automatic generation algorithm is difficult to generate accurate related keyphrases in the process of synonyms substitution and text summary,combined with the ACKGSIF algorithm,an automatic keyphrase generation algorithm based on heterogeneous correlation subgraph(AKGHAS)is further proposed.The algorithm organizes the co-occurrence relationship and synonymous relationship between words by constructing heterogeneous association sub-diagrams,so that the association between words and words is explicitly presented,and extracts the correlation information between words and words through the aggregation and update method at the node level of the graph,increasing the solution space of related words in the model,and then fusing the graph features and multisource text features at the decoder level to enhance the semantic consistency between the generated keyphrases and the real keyphrases.(3)Based on the above two algorithms,this thesis designs and implements a keyphrase automatic generation prototype system with enhanced semantic consistency,which can support automatic keyphrase generation for online question answering and computer thesis fields.Based on the prototype system,the system is tested and analyzed from the aspects of timeliness and semantic consistency of generated keyphrases,which proves that the system can generate high-quality keyphrases to meet user needs.
Keywords/Search Tags:keyphrase generation, the fusion of stemming information, semantic consistency, heterogeneous association subgraph, relationship between words
Related items