Font Size: a A A

Proteomics Data Research On The Calculation Method Of Similarity

Posted on:2018-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:M HaoFull Text:PDF
GTID:2310330569986404Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the field of biological research into the post-gene era,the field of science equipment is improving.The 21 st century life sciences research began to enter the high-throughput biological data for the study group of subjects(proteomics,genomics,metabonomics,etc.is one of the typical representative).Proteomics study various expressions of biological samples,it can serve to find disease-specific proteins,to take the appropriate intervention treatment before the onset of symptoms of the patient.In today's era of large data,proteomics data is undoubtedly an important part of understanding life "whole".The size of the data produced in the proteomics laboratory has increased by several orders of magnitude,and the massive proteome experimental data has brought great impetus to researchers in the field of biological and medical sciences.However,there is still a lot of information and knowledge in the proteome database that have not been paid attention to or have not been found in the association,so that researchers have to dig them.Therefore,how to accurately and quickly find out the related data of the proteome is a hot research topic.This thesis aims at the structure and content information of proteome metadata at present.Based on the TF-IDF algorithm commonly used in Internet and text knowledge mining,the similarity problem of proteomic experimental data is studied as follows:1.This thesis first elaborates the basic concepts of proteomics and the current development of proteomics.In today's large data environment,proteomics experimental data is also constantly expanding.For proteomics researchers,proteomics data can significantly increase the efficiency of the study of the clerks if they are able to give researchers some of the approximate experimental data,as in the other areas of the recommended system,to help them find new knowledge.According to this demand,this thesis makes a further analysis and Research on the content information of proteomics metadata.2.This thesis presents a method of text similarity calculation based on biomedical synonyms and TF-IDF.Based on the functions of Bioportal,such as biomedical ontology query,biomedical synonyms and so on.Constructing a local biomedical thesaurus,combining the TF-IDF method to calculate the similarity of text.This method makes use of the synonym information in the professional field,and calculates the similarity between the experimental data from the perspective of text description3.A similarity algorithm based on molecular evidence is proposed in this thesis.According to the characteristics and significance of proteomic data,the similarity calculation of two protein groups of experiment data,the protein containing the protein into the feature,the feature mapping for feature vectors,the transformation of the proteome for vector operations on vector space.Finally,from the text description information of the angle and the biological significance of the experimental data to calculate the similarity of the angle of proteomics experimental data,according to the experiment data shows that compared protein molecular evidence of learning similarity algorithm can be more true to calculate the similarity of proteins based on experimental data.
Keywords/Search Tags:Proteomics, Biomedical, TF-IDF Method, Vector Space Model, Similarity
PDF Full Text Request
Related items