Font Size: a A A

Information Extraction In Biomedical Literature And Protein Complex Identification

Posted on:2015-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J ZhangFull Text:PDF
GTID:1228330467486954Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Biomedical literature is the major mode of academic achievement presentation and academic exchange. A large number of biomedical literature have become a huge knowledge repositories, which is the most important biomedical domain resource. Biomedical information extraction is an application of information extraction technology on biomedical literature to the biomedical domain demand, and extracts specific biomedical knowledge effectively and accurately. Biomedical extraction and relevant applications can help biomedical researchers in various fields. Biomedical entity relation extraction and event extraction are the important part of biomedical information extraction research at present. In the dissertation, we study on biomedical entity relation extraction and event extraction based on the precious works.The structure and function of protein complex is the key to explore various biological processes. The vital biological processes of protein complex can bring the important breakthroughs in relevant domain. Protein complex identification is the first step of the protein complex research, which is also the base of proteomics. In the dissertation, we apply the biomedical information extraction technology in protein complex identification and improve the performance of protein complex identification by mining and integrating biomedical multiple domain knowledge.The main work of this dissertation can be summarized as follows:We propose hash subgraph pairwise (HSP) kernel for biomedical entity relation extraction by integrating hash operation theory into graph kernel framework. HSP kernel approach, based on hash operation, can efficiently transform the complex syntactic features of syntactic dependency graph into hierarchy hash labels. Furthermore, HSP kernel function can map syntactic information into high-dimensional subgraph pairwise feature space. HSP kernel can mine and exploit more complex syntactic features of dependency graph. In particularly, HSP kernel can control the computational complexity based on the parallelism of hash operation.We propose rich feature based approach for biomedical event trigger detection. Rich feature based approach uses hash operation to transform syntactic information of dependency graph into hash feature, and combines hash feature and basic feature into rich feature. Hash feature can efficiently mine and exploit syntactic information of dependency graph and basic feature can represent important lexical information of sentence. Hash feature and basic feature are complementary and important in the detection of biomedical event trigger. The experimental results show that rich feature based approach can effectively improve the performance of biomedical event trigger detection.We propose CSO_Weighted approach for protein complex identification. Firstly, we use HSP kernel method to extract protein protein interaction (PPI) from biomedical literature effectively. Secondly, we integrate high-throughput PPI data, biomedical literature PPI data and gene ontology data by constructing protein attributed networks based on attributed graph theory. Thirdly, we propose CSO_Weighted algorithm to identify protein complex in protein attributed networks. The experimental results show that integrating gene ontology data and biomedical literature PPI data can not only reduce the affect of noise data of high-throughput PPI data, but also combine the structure topology of protein complex with function feature of protein complex in the research of protein complex identification.
Keywords/Search Tags:Biomedical information extraction, Hash graph kernel, Protein complexidentification, Protein attributed networks
PDF Full Text Request
Related items