Font Size: a A A

Research On The Key Techniques Of Information Extraction In Biomedical Domain

Posted on:2015-01-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1228330467986920Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of rapid development of information technology, published biomedical literature has shown explosive growth trends. At the same time, the research and development of system biology constantly require revealing various levels of relationship information existing in biological molecules. This situation actuates the revolution of information extraction technology in the biomedical domain tracing from named entity recognition to interaction extraction, and even more complex field, such as event extraction. Against this background, this dissertation discusses the key technologies of two core tasks in biomedical information extraction, namely Protein-Protein Interaction Extraction (PPIE) and Biomedical Event Extraction (BEE), with biomedical literature as the primary data source. The aim is to effectively extract or organize structured information from the mass of biomedical literature, and discover potential knowledge for the biomedical study and application.The main tasks and research status of PPIE and BEE are analyzed in this dissertation. In the current PPIE tasks, aimed at different importances for various features, multi-level structural and semantic features from contexts and syntactic structures are explored firstly, then a kind of PPI extraction system based on the rich features is established by support vector machine (SVM) classifier, and the preferably performance of PPI extraction is achieved by this method. After that, the heterogeneous kernel functions, such as feature-based kernel, walk-edge-weighted subsequence kernel and all-path dependency graph kernel are selected, and the study focuses on the data structure and the construction way of each kernel. A algorithm of weighted multiple kernels fusion is proposed, so as to complementing each kernel’s advantages, by which the important syntactic and semantic features can be obtained from different sides. Meanwhile the risk of missing important features should be reduced effectively. In the task of biomedical event extraction, the event trigger recognition is considered to be the core in view of the trigger ambiguity issue. Different levels of syntactic and semantic features are selected and extracted by utilizing the dependency and deep syntactic parser, the abundant and personalized feature sets are then built. Furthermore, with the help of multi-classifier, LIBSVM, a feature fusion model for trigger detecting is constructed with the divide and conquer strategy, which is able to avoid feature missing and exert the impact of different features. By this means, the performance of trigger detection is enhanced effectively. A semi-supervised learning method is presented by combining the labeled and unlabeled data in the biomedical event extraction, so as to solve the problem of data sparseness. We use the strategy of coupling and generalizing features to generate new features which have strong recognizable to recognize event arguments. Moreover, an argument classification model is constructed. The ultimate goal is to improve the performance of event extraction and establish a fine-grained biomedical event extraction system.The experimental results on the multiple PPI corpora of AIMed etc. show that our PPI extraction system of weighted multiple kernel fusion possesses the good extraction ability and generalization ability with its advanced performance in the current machine learning PPI extraction. The experimental results on the general BioNLP corpora indicates that the performance of the trigger detection is improved effectively driven by biological event triggers. The established semi-supervised machine learning model takes the missing information of original model into consideration, and after combining it with original model, it achieves the complementary effect, furthermore, its performances have reached the state-of-the-art level in the nine kinds of shared tasks of biological event extraction compared with the similar present research works. Especially, it shows an outstanding advantage to some complex event extractions as regulation event etc.
Keywords/Search Tags:Natural Language Processing, Information Extraction, Feature, Protein-Protein Interaction Extraction, Biomedical Event Extraction
PDF Full Text Request
Related items