Font Size: a A A

Research On Keyword Extraction Of Hanyue News Based On Hypergraph

Posted on:2018-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q FanFull Text:PDF
GTID:2358330518460494Subject:Control engineering
Abstract/Summary:PDF Full Text Request
with the development of the Belt and Road,China's attention to Vietnam began to improve.News website,as the most important carrier of information,which is an important way for people to get information.Vietnamese is a minority language,how to quickly position the important Vietnamese news has become a problem.Bilingual news keyword extraction can save a lot of time,improve the information usage,which has important research value.However,current studying only focus on the trait of words,which did not consider the multi-relationship of news document.It is an urgent problem to find a suitable model to express these complex relations.Hypergraph can express the complex relationship between multiple entities,this paper chooses the hypergraph model to study the method of solving the problem of keyword extraction in multiple environments.The main works:1.We proposes a method of extracting keywords based on hypergraph in the circumstances of single document.The method firstly analyzes the structural characteristics of news document,using the word as the core of method,extracts the word frequency,part of speech,word span and position factor of a word in the document as the characteristic information.Then using words in the news document as hypergraph vertex,sentence as hyperedge,to build a news hypergraph model.2.We proposes a method of extracting keywords based on hypergraphs in multi-document.This method extracts the time element and the number of comments in a news page as the characteristic information by analyzing the influence of these characteristics on the extraction of keywords.Then using word in news document as vertex,news document itself as hyperedge,to bulid news hypergraph model in multi-documents.3.We proposes a method of extracting bilingual news keywords based on hypergraph.This method firstly analyzes the characteristics of bilingual news documents,and uses bilingual word frequency as the core characteristic information of words,and combines the part of speech,word span and position feature to set the weight of words.Then,using word as vertex,sentence and word corresponding set as hyperedge,to build bilingual news hypergraph model.The random walk algorithm on the hypergraph is used to sort the vertices in hypergraph,and finally get the news document keywords.
Keywords/Search Tags:News document hypergraph model, hypergraph ranking, random walk, keyword extraction, Chinese-Vietnamese bilingua
PDF Full Text Request
Related items