Font Size: a A A

The Research Of Open Information Extraction System

Posted on:2021-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhanFull Text:PDF
GTID:2518306503980369Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Information extraction is about how to extract useful knowledge from huge amount of text which becomes a more and more popular research field.The work of this paper belongs to two sub-fields of information extraction:open information extraction(Open IE)and name entity recognition(NER).For the former,this paper makes contribution in both model and data usage.For the model,this paper proposes a span selection model which outperforms other Open IE models.For the data,this paper uses the confident score as coefficient of loss function which can extract more patterns from noisy training data.This paper also points out some annotation problems of existing Open IE benchmark and relabels a new benchmark for future research.The second work of this paper is a fully unsupervised NER system which relies only on word embeddings.This system consists of two modules:an NE detection module based on Gaussian Hidden Markov model and an NE type classification module based on Deep Auto-Encoder Gaussian Mixture Model.The novelty of this model is the combination of unsupervised machine learning algorithms and pre-trained word embeddings.Since this system is independent of annotated corpus,it is more robust than other supervised or semi-supervised NER systems.
Keywords/Search Tags:Open Information Extraction, Unsupervised Name Entity Recognition, Noisy Data Processing, Pre-trained Word Embeddings
PDF Full Text Request
Related items