Proteomics Data Research On The Calculation Method Of Similarity

Posted on:2018-09-16

Degree:Master

Type:Thesis

Country:China

Candidate:M Hao

Full Text:PDF

GTID:2310330569986404

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the field of biological research into the post-gene era,the field of science equipment is improving.The 21 st century life sciences research began to enter the high-throughput biological data for the study group of subjects(proteomics,genomics,metabonomics,etc.is one of the typical representative).Proteomics study various expressions of biological samples,it can serve to find disease-specific proteins,to take the appropriate intervention treatment before the onset of symptoms of the patient.In today’s era of large data,proteomics data is undoubtedly an important part of understanding life "whole".The size of the data produced in the proteomics laboratory has increased by several orders of magnitude,and the massive proteome experimental data has brought great impetus to researchers in the field of biological and medical sciences.However,there is still a lot of information and knowledge in the proteome database that have not been paid attention to or have not been found in the association,so that researchers have to dig them.Therefore,how to accurately and quickly find out the related data of the proteome is a hot research topic.This thesis aims at the structure and content information of proteome metadata at present.Based on the TF-IDF algorithm commonly used in Internet and text knowledge mining,the similarity problem of proteomic experimental data is studied as follows:1.This thesis first elaborates the basic concepts of proteomics and the current development of proteomics.In today’s large data environment,proteomics experimental data is also constantly expanding.For proteomics researchers,proteomics data can significantly increase the efficiency of the study of the clerks if they are able to give researchers some of the approximate experimental data,as in the other areas of the recommended system,to help them find new knowledge.According to this demand,this thesis makes a further analysis and Research on the content information of proteomics metadata.2.This thesis presents a method of text similarity calculation based on biomedical synonyms and TF-IDF.Based on the functions of Bioportal,such as biomedical ontology query,biomedical synonyms and so on.Constructing a local biomedical thesaurus,combining the TF-IDF method to calculate the similarity of text.This method makes use of the synonym information in the professional field,and calculates the similarity between the experimental data from the perspective of text description3.A similarity algorithm based on molecular evidence is proposed in this thesis.According to the characteristics and significance of proteomic data,the similarity calculation of two protein groups of experiment data,the protein containing the protein into the feature,the feature mapping for feature vectors,the transformation of the proteome for vector operations on vector space.Finally,from the text description information of the angle and the biological significance of the experimental data to calculate the similarity of the angle of proteomics experimental data,according to the experiment data shows that compared protein molecular evidence of learning similarity algorithm can be more true to calculate the similarity of proteins based on experimental data.

Keywords/Search Tags:

Proteomics, Biomedical, TF-IDF Method, Vector Space Model, Similarity

PDF Full Text Request

Related items

1	Identification Of Protein-protein Interaction Based On Relational Similarity Of The Text
2	Historical Research Of Vector Space Theory
3	Deep Learning-Based Methods For Biomedical Text Filtering And Information Extraction
4	Multi-view Similarity Network Fusion For Biomedical Entities Association Prediction
5	Research On The Similarity Model Of Scientific And Technological Documents With Mathematical Formulas And Contexts
6	Proteomics Analysis Of B.S 168 Overproducing PGA
7	Hamiltonian Systems On Symplectic Vector Spaces
8	Activity Analysis And Construction Recombinant Procaryotic And Adenovirus Vector Of IL-24,Proteomics Study Of AdIL-24
9	Research On Undersampling Algorithm Based On Word2vec And Vector Space Model
10	Research On Spatial Similarity Calculating Method Between GML Documents