Font Size: a A A

Using Distant Supervision And Representation Learning For Entity Relation Extraction

Posted on:2017-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiuFull Text:PDF
GTID:2348330518995541Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
We are living in information era,still we have difficulty in finding knowledge.Reporting by a recent study,human has storied more than 295 EB data.The majority of text data is stored in the form of plain text,such as news articles,BBS,e-mails,SNS messages,and about a billion webpages.How to adequately utilize cyber text for data mining and text understanding is of great important.The result could be used for improving search engine's performance,automatic QA system and knowledge base population.This paper studied relation extraction,which could find knowledge from enormous plain text.We improved distant supervision and introduce representation learning to RE system.The main contribution and result of this work is listed below:1.Propose fuzzy classification based multi-instance multi-label learning to combat noisy data that distant supervision introduced.For a given entity pair,this algorithm use all its mentions and labels to learn extraction model.What's more,we add an aggregate layer to get sentence-level features together.The result shows that our contribution beat state-of-the-art distant supervised baseline.2.Introduce sentence vector and recurrent neural networks for RE.Representation of relation mentions is determined by manual method in traditional way.And the representation is task-specific,which means it may work well only on few situations.Neural networks are proved efficient in auto feature extraction,meanwhile pre-processing for text is unnecessary.Experiment proved that each method could improve system performance.3.Design and realize entity relation extraction platform with our two major contributions.The platform is consist of text retrieval,POS,NER,sentence parser.This platform could be used for end-to-end relation extraction,input knowledge base and train data,output extraction results.Result on LDC and New York Times shows that our method is superior to baseline.
Keywords/Search Tags:relation extraction, distant supervision, representation learning, word embedding, recurrent neural networks
PDF Full Text Request
Related items