Font Size: a A A

Research On The Construction Technology Of News Text Vocabulary Chain Based On Wikipedia Corpus

Posted on:2018-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2358330518960475Subject:Computer technology
Abstract/Summary:PDF Full Text Request
An efficient method of information text processing can quickly deal with the news text,so as to get the text category,keywords,and deeper semantic meaning and semantic relations.The construction of lexical chain fast processing of the news text is important,compared to the traditional frequency keyword extraction method and machine learning based on the lexical chain based on network corpus,the integration of the human right cognition,because Web corpus resource library update frequency of the high speed and reasonable structure of news text classification,further study than other the method has a better effect by the lexical chain.At present Chinese lexical chain existing construction method can not solve the word disambiguation problem,lexical chain construction is often not the correct expression of semantic relationship between text clustering,and then affect the quality of the extracted keywords.In order to help readers grasp the meaning of the news text,and determine the structure of news discourse,this paper studies the following aspects:(1)Wikipedia's classification structure and document link information figure two features using the depth weighted path length path information of candidate words are based on the relationship between the depth of the node(DPL)algorithm;explicit semantic analysis of text vector interpretation based on document classification(ESA)algorithm to calculate the correlation degree between word and word the preliminary construction to the lexical chain,consider text keyword weighting calculation extraction algorithm,combined with the 5 features of the news text is built at the beginning of the lexical chain optimization on the portal site to climb from the more than 1 thousand and 500 news texts on the lexical chains mentioned in this paper construction algorithm test method get the other keywords and keyword extraction were tested,the results show that the method of constructing lexical chains extracted by key words better.(2)the structural characteristics of Wikipedia data resource library affiliation,resource library and link reproduction characteristics with the classical MGKM2003 method based on the combination of construct MGKM-WIKI disambiguation algorithm of primary lexical chains for further disambiguation;candidate word disambiguation algorithm to the data in the MGKM-WIKI Semval-3 as a word sense disambiguation system set,and the other is supervised disambiguation algorithm,unsupervised WSD algorithm were compared and obtained good results.(3)on the basis of the construction of the lexical chain,we use the alignment technology to construct the lexical chain of Vietnamese news texts,and try to build up a large number of Vietnamese news texts.(4)the prototype system is designed based on the above research.The system can be used to construct the lexical chain of Chinese and Vietnamese news texts.
Keywords/Search Tags:Natural Language Processing, Wikipedia, lexical chain construction, word sense disambiguation
PDF Full Text Request
Related items