Font Size: a A A

Text Information Extraction Based On Domain Rules And Deep Learning

Posted on:2018-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q RaoFull Text:PDF
GTID:2348330542950296Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the current Internet period,the amount of information and data is growing day by day.As one of the important parts,text data is also increasing.How to acquire knowledge quickly from the huge text data has been a popular research.Biomedical text information extraction is an important application of text mining in biomedicine.On the basis of biomedical entity recognition,biomedical relation extraction,biomedical event extraction and biomedical entity coreference resolution,we want to build biological networks to help biomedical workers conduct various studies and research.In this thesis,there are three contributions on biomedical event extraction and biomedical entity coreference resolution:(1)A hybrid event extraction method based on SVM and biomedical text rules is proposed.For complex biomedical events,different types have different syntactic and semantic properties,and are difficult to be tackled by a single method.On the basis of multi-class SVM,the proposed method utilizes different syntactic and semantic rules for different events during the post-processing.The experiments confirm the usefulness of features and rules.The hybrid system has good performance on event extraction and obtains the best result on Bio NLP See Dev task.(2)A protein coreference resolution method based on syntactic parsing tree and biomedical domain properties is proposed.Since different types of coreferences have different characteristics,the proposed method deals the coreferences of relative pronouns,personal pronouns and definite noun phrases with three different resolution approaches.We use syntactic parsing rules for relative pronouns and personal pronouns,and use biomedical domain property rules for definite noun phrases.We experiment the method on Bio NLP protein coreference resolution task,and obtain a better result than state-of-the-art.(3)A protein coreference resolution method based on LSTM is proposed.The proposed method generates representation features of anaphors and antecedents,mention-vector,during the training of word2 vec.For a word sequence containing an anaphor and a candidate antecedent,we use mention-vector,word vector,as well as other few features,to learn a representation feature of the sequence from LSTM.And the model outputs probability-like results to rank the candidate antecedents of an anaphor.Thus we could choose the best antecedent for every anaphor.With few input features,the proposed method automatically learns global discriminative feature representations from the dataset for all the types of coreferences.The method avoids the tedious manual rules mining compared to the rulebased approach.
Keywords/Search Tags:Text Information Extraction, Biomedical Event Extraction, Coreference Resolution, LSTM
PDF Full Text Request
Related items