Font Size: a A A

Research On The Methods For Document-Level Event Extraction From Chinese Unstructured Texts

Posted on:2020-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2428330575488970Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the development and popularization of Internet,websites have become the most essential part of people's life.The large amounts of unstructured texts on the internet not only bring the users convenient access to information but also the trouble of information redundancy.Faced with the ever-growing Web data,it is urgent to solve the problem that how to help users quickly obtain the knowledge so as to reduce the time cost.As a key task in natural language processing technology,information extraction mainly aims to automatically extract knowledge from unstructured text and feedback to users with structured data,which can be categorized into entity extraction,relationship extraction,and event extraction.The main content of this paper is event extraction for unstructured text,which is crucial to obtain event information,construct large-scale event knowledge graph and help other natural language understanding tasks.Existing frameworks for event extraction fall under one of two different paradigms:sentence-level event extraction and document-level event extraction.The sentence-level event extraction focuses on identifying individual event mentions from sentences in the document,along with any entities which fulfill argument roles in these events.The document-level event extraction focuses on the core event description in the document and returns structured event information to users.Obviously,document-level event extraction is more common in real life,allowing users to quickly obtain structured event information from documents.This dissertation focuses on the aforementioned challenges in document-level event extraction,and the main contributions are as follows;1.We proposed a document-level event extraction approach based on joint labeling and global reasoning.Due to the complexity and the diversity of event descriptions in texts,a complete event may be mentioned by multiple sentences in many cases.Therefore,the fusion of events in the document is needed to get the complete structured event information.To solve this problem,this paper proposes a document-level event extraction method based on joint labeling and global reasoning.We first propose a self-attention based Sequence Labeling model for the joint extraction of entities and events from sentences.And then,we use the Multi-Layer Perception to label the entities in the events and identify their roles.Finally,based on the labeling and identification results,we leverage integer linear programming to do global reasoning within a document.Experimental results on the data set demonstrate the effectiveness of the proposed method.2.We present an event extraction framework to detect event mentions and extract events from the document-level financial news.Up to now,methods based on the supervised learning paradigm gain the highest performance in public datasets.These methods heavily depend on manually labeled training data.However,in particular areas,such as financial,medical and judicial domains,there is no enough labeled data due to the high cost of the data labeling process.We propose a Document-level Chinese Financial Event Extraction(DCFEE)system which can automatically generate a large scaled labeled data and extract events from the document.Experimental results demonstrate the effectiveness of this document-level event extraction system.
Keywords/Search Tags:document-level event extraction, sequence labeling, automatically labeled data generation, global reasoning
PDF Full Text Request
Related items