Font Size: a A A

Research On The Relationship Extraction Of Protein Phosphorylation And Disease

Posted on:2018-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:G Y LongFull Text:PDF
GTID:2348330512986744Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein phosphorylation is one of the most important protein post-translational modification in living things.It have been confirmed that a lot of human diseases was caused by abnormal phosphorylation at present.Some disease-related phosphorylation modifications can be developed as molecular markers or treatment targets for the diseases.With the explosive growth of biomedical literature,how to extract the relationship between protein phosphorylation and disease from biomedical literature automatically has become a hot research topic in related fields.Protein phosphorylation and disease relationship extraction task mainly includes disease name entity recognition and the judgement of protein phosphorylation and disease.At present,the main way to solve the problem of disease named entity recognition is machine learning.However,machine learning is difficult to identify medical terms in the disease name effectively.There is no publicly available system for extracting the relationship of protein phosphorylation and disease at present.In this paper,the relationship extraction of protein phosphorylation and disease was studied,and the research and contributions are described as follows:This paper presents a method of conditional random field that combining with a semantic dictionary for the disease named recognition problem.The dictionary of medical terms with semantic information which can overcome the difficulties of the recognition of the medical terminology in disease named entities.was constructed by using network resources.Firstly,obtaining the semantic information of the medical terms by the dictionary,then these information combined with lexical and POS features,spelling and domain features was used by CRF to recognize the disease name.Finally the adjustment of the identification of abbreviations was uesed to promote the disease named entity recognition effect.The experimental results show that our method improves the F-measure about 2.5%than the DNorm method in the NCBI Disease Corpus data sets.The experimental results in the simulated data sets show that the proposed method has certain advantage over the recognition of long disease names.The relationship of protein phosphorylation and disease can be divided into four types:Absence,Presence,Down-regulation and Up-regulation.In this paper,a protein phosphorylation and disease relationship extraction system PDRMine has been established.The relationship extraction system can be divided into three steps:firstly,to extract protein phosphorylation information by a rule based phosphorylation information extraction system RLIMS-P;then to recognize the disease named entities from the sentence that contains phosphorylation information by our disease named entity recognition method.Finally,a rule based method was used to identify the relationship type of protein phosphorylation and disease,the difficult of the last step is the identification of trigger words.To improve the effect of the relationship extraction of protein phosphorylation and disease,a synonym expansion method was used to get more trigger words.The relationship extraction method proposed in this paper got a 72.6%accuracy and a recall of 66.4%on open data set.
Keywords/Search Tags:bioinformatics, disease named entity recognition, medical terminology, semantic dictionary, conditional random field, protein phosphorylation, relationship extraction
PDF Full Text Request
Related items