Font Size: a A A

Research And Implementation Of Chinese News Element Extraction Technology

Posted on:2022-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:X C ZouFull Text:PDF
GTID:2518306764476884Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
With the popularization of the Internet and the development of computer technology,the rapid acquisition of key information from massive amounts of information has become one of the important issues to be solved urgently in academia.Therefore,extracting key information from Chinese news has important research significance and is also the focus of thesis.Thesis studies the element extraction technology of Chinese news texts from three aspects: entity extraction,event extraction and abstract extraction.Aiming at the problems existing in the process of Chinese news element extraction,the research contents of the thesis are as follows:(1)Thesis proposes an entity extraction method based on vocabulary enhancement,which solves the limitation of using only character vector representation in the traditional Chinese named entity recognition model.The joint embedding of words is realized through the word Lattice structure based on vocabulary enhancement,which realizes the multi-feature embedding of the input layer,and integrates the word features and semantic information of Chinese characters.The encoding layer extracts the semantic features of characters in sentences through a bidirectional LSTM structure,and introduces an attention mechanism to solve the problem of weight adjustment between hidden layers and improve the importance of entity vocabulary-related features.Finally,the CRF decoding layer is used.Perform processing to obtain entity extraction results.Finally,the validity and feasibility of the model are verified by comparative experiments on the resume data set and the boson data set.(2)Thesis proposes an event extraction method based on MRC(Machine Reading Comprehension),which solves the problem that traditional event extraction methods are difficult to capture semantic information.In the event extraction method based on MRC,due to the difference between Chinese and English languages,it is not possible to directly use the English text-oriented question template to extract Chinese texts.Therefore,thesis designs a problem template suitable for Chinese event extraction,uses the BERT pre-training model,and applies MRC to Chinese tasks.At the same time,a comparative experiment is designed in the DUEE Chinese data set to verify the effectiveness and feasibility of the model.(3)Thesis proposes an abstract extraction method based on the improved Text Rank algorithm,which takes into account the semantic and quantitative characteristics of the text and optimizes the abstract extraction effect.The algorithm is based on the text vectorization model of Word2 vec and TF-IDF,and a statistical method of weighted cosine similarity is introduced to replace the original similarity solution process of the algorithm,and the MMR algorithm is introduced to control the redundancy of text.Finally,the validity and feasibility of the model are verified by comparative experiments.In addition,based on various improved systems,the thesis designs and develops a system to realize the system control of Chinese news system.
Keywords/Search Tags:Entity Extraction, Event Extraction, Abstract Extraction, Natural Language Processing, Deep Learning
PDF Full Text Request
Related items