Font Size: a A A

Research On Key Technologies Of Schema Induction Based Open-domain Event Extraction

Posted on:2022-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:D K HaoFull Text:PDF
GTID:2518306572960109Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Event extraction is the task for extracting structured event information contained in natural language texts.Event extraction tasks can be divided into domain-specific event extraction and open-domain event extraction according to whether the extraction target is limited to events in a specific field.Domain-specific domain event extraction needs to specify the extraction domain first,then manually predefine the event schema of this domain,then perform event extraction in this domain based on the predefined schema.Open-domain event extraction means detecting events of unrestricted type and schema from the text and extracting event argument information without limiting the type and schema of the event.The method of domain-specific event extraction tasks often has the problem of difficulty in migration between domains.In contrast,the setting of open-domain event extraction tasks is more helpful to extract a wide range of unrestricted types of event information,which is a very valuable research problem.The existing open-domain event extraction related methods have many shortcomings.The existing supervised learning extraction methods are usually based on manually pre-defined event schemas with limited coverage of event types and rely heavily on manually labeled data.Most of the existing unsupervised extraction methods based on probability graph models have poor results when used for event argument role representation and argument extraction,and they are difficult to apply to open-domain event extraction for large-scale news text corpus in daily life.Therefore,this dissertation mainly studies key technologies of schema induction based open-domain event extraction,automatically sums up the event schemas contained in the unlabeled news text corpus,and conducts open-domain event extraction on news text corpus based on the induced event schemas.The extraction is divided into three parts: open-domain event type induction,event argument role induction,and open-domain event extraction based on event schema induction.We propose an event type induction method combining topic model and pre-trained language models,an event argument role induction method based on graph representation learning and an open-domain event extraction method based on event schema induction.The proposed method is used to conduct an open-domain event extraction experiment on a data set based on English news text.The experimental results show that the event type induction method proposed in this paper has a greater effect on the coherence and type diversity of event trigger word set compared with the baseline method.The argument role induction method proposed in this paper can summarize representativeness and uniqueness.With a better argument role representation,the open-domain event extraction method based on event schema induction proposed in this paper also has a significant improvement in the extraction effect compared to the baseline method.The open-domain event extraction method based on automatic schema induction proposed in this paper can avoid the traditional event extraction method's dependence on manual annotation data,and compared with manually constructed event schemas,the automatic induction of event schemas from a large amount of text data reflects more objectively.Events in the real world have good interpretability and practicability,have important theoretical significance and broad application prospects.
Keywords/Search Tags:Open-domain Event Extraction, Event Schema Induction, Neural Topic Model, Pre-trained Language Model, Graph Representation Learning
PDF Full Text Request
Related items