Font Size: a A A

Research On Extraction Of Protein-protein Interactions By Searching Large Scale Text

Posted on:2013-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:E Y FengFull Text:PDF
GTID:2298330422479938Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the embodiment of life activities,proteins are not isolated.They complete the most of processof cells through the mutual interactions.The establishment of the protein-protein interaction(PPI)network has been the core issues of the research on biological process.Many databases of PPIs havebeen built by the domain experts.However,with the rapid growth of biomedical literature,manuallycollecting the complete PPI information is not realistic.At present,a large number of PPI informationis still scattered in various biomedical literature.It is very important to automatically mine PPIinformation from the text for the establishment of PPI network.In order to meet the needs of constructing PPI network and address the problem of current PPIidentification systems using single sentences as evidence,and often suffering from the heavy burdenof manual annotation,in this thesis,two methods of automatic identification of PPI by searching largescale text are proposed.PPIs are identified based on clues extracted from large-scale text instead ofsingle sentences.The training data is taken from existing PPI databases and no extra annotation workis needed.One of the methods explores a machine learning approach using unigrams as features.Fourtechniques of term weighting and feature selection are compared in the experiment.Results show thatthis method achieves high precision and reasonable recall,the F-Score is75.89%.The other methodidentifies PPIs based on the relational similarity.Three types of features including lexical features,phrases, and syntactic dependency are extracted. Finally, similarity between protein pairs is calculatedto classify the relationship between the two proteins. Results of the experiment show that this methodachieves high F-score (75.02%).
Keywords/Search Tags:protein-protein interactions, large scale text, feature extraction, vector space model, Relational similarity
PDF Full Text Request
Related items