Font Size: a A A

Complex Text Keyword Mining Method Based On Graph Embedding Model

Posted on:2022-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:C W WuFull Text:PDF
GTID:2518306341968929Subject:Electrical engineering
Abstract/Summary:PDF Full Text Request
A large amount of text data has been accumulated during the construction of China's electric power industry informatization project.As an important part of electric power big data,the huge value contained in electric power data is increasingly gaining recognition in the industry.It has become an important research area in big data mining in the power industry.This paper aims to apply keyword extraction technology to extract keywords that reflect the subject information of news report datasets involving multiple fields and dataset of academic papers related to the electric power industry.This paper proposed a novel Technical word recognition method in the field of electrical engineering to improve the Chinese word segmentation effect of academic paper datasets.Also,three graph model keyword extraction algorithms were proposed.In addition to analyzing their keyword extraction effects on different datasets,they were also applied to text classification tasks.The main research work is summarized as follows:(1)Recognition algorithm of technical words in the Electrical Engineering Field.Aiming at the problem that the Jieba tool cannot effectively identify technical words in the electrical engineering field,which leads to inaccurate word segmentation results,this paper combined three characteristics of the relationship between words and proposed a technical words recognition algorithm.After importing technical words recognition results into the Jieba word segmentation database,the experimental results showed that this algorithm can improve the word segmentation effect and lay a good foundation for subsequent keyword extraction.(2)TextRank keyword extraction algorithm based on multi-feature fusion.Taking advantage of the strong scalability of the TextRank algorithm,this paper combined the features of the word TFIDF,the location and the structure of the word network graph,and proposed a TextRank based on multi-feature fusion,named the MFFTR algorithm.Experimental results showed that this algorithm can better extract keywords in the text,and has certain advantages compared to the existing six keyword extraction algorithms in terms of precision,recall and F1 value.Compared to the classic TextRank algorithm,the MFFTR algorithm can increase the F1 value by 2.7%,0.8%,and 3.2% on the three experimental datasets,respectively.(3)Keyword extraction method based on polygon structure of word graph.In this paper,fusion of TFIDF and location features as initial weights of word nodes enriched the word graph information,and at the same time applied the triangle structure and quadrilateral structure of the word network graph to sort word nodes.Based on this,KTSG and KQSG algorithms were respectively proposed.Comparing experimental results to the existing six keyword extraction algorithms,the KTSG and the KQSG algorithms improved in multiple keyword extraction evaluation indicators,especially in the hit rate indicator,the KTSG algorithm can increase from0.1% to 8.1%,KQSG algorithm can increase from 0.5% to 8.2%(4)Graph embedded keyword mining model ensemble learning.In order to verify the effectiveness of the MFFTR,KTSG and KQSG algorithms,they were applied to text classification tasks.In the process of constructing the text vector space,they were combined with the existing 9vocabulary feature extraction methods,and a text-feature ensemble learning method based on multi-feature fusion was proposed.Comparing with traditional feature extraction methods on three classic classifiers,the experimental results showed that the method proposed in this paper can improve the text classification effect to a certain extent.
Keywords/Search Tags:Graph Model, TextRank, Feature Fusion, Keyword Extraction
PDF Full Text Request
Related items