Intelligent Agent-based Biological Information Retrieval System Design And Realization

Posted on:2010-08-21

Degree:Master

Type:Thesis

Country:China

Candidate:X P Liu

Full Text:PDF

GTID:2208360275983329

Subject:Software engineering

Abstract/Summary:

Bioinformatics is a new comprehensive cross discipline involving biology, computer science and application mathematis. The number of documents on the the biological information database increase exponentially. How to organize, retrieval and process these great information become a big challenge. In the information area of biology,peptide and protein identification via tandem mass spectrometry and database retrieval is an important biological sequence retrieval problem.The Machine-learning is a new technology which makes computer simulate or achieve men's learning-action ,so it can automatically obtain new knowledge and skills,then improve oneself such as reorganizes the older frame of knowledge and improves the technique . The correlation between the Measurement of the Preservative Datas in the Database and the Query of the Users is one of the most important components in every Information Retrieval System. In this thesis, the author studied the Protein Sequence Identification and the Protein Homology Prediction from two aspects of retrieval function design by the Machine Learning Techniques.Due to the complexity of practical retrieval problems, there are usually more than one basic index of the correlation between the Measurement of the Preservative Datas in the Database and the Query of the Users, resulting in multiple-dimensional feature vectors. How to combine the multiple relevance indexes into a single one by the Machine-learning is the problem of retrieval function construction which the author studied in this thesis. In a word ,the author studied the definition in the relevant characteristic vector real-valued function so as to carry on a row of position to the retrieval result. The block structure of data is a unique feature of retrieval function learning problems. This thesis combining the protein homology prediction describes a series of approaches for more accurate learning of retrieval functions based on the block structure. These approaches range from the intra-block data normalization and block feature expansion methods for solving the non-i.i.d. (independent and identically distributed) problem, the block selection and support vector under-sampling methods for reducing redundant data, and the K-nearest-block ensemble method for designing query-adaptive retrieval functions.Through linking the Agent and Data Mining Technology ,the author designed personalized bioinformation retrieval system based on intelligentAgent. The users'interest characteristics can be vailable to the system by analysing WebPages, and desirable information can be obtained by the user from the system. Accordingly personalized information service is implemented. That is, the system not only can can filter the personalized information based on the likings of the user be implemented, but also can realize the cooperation filter through the information exchanges among Agents.

Keywords/Search Tags:

information retrieval, bioinformatics, tandem mass spectrometry, machine learning, Agent

Related items

1	Research And Implementation Of PepNovo Parallelization Based On Tandem Mass Spectrometry
2	The Research On The Quality Control Methods Of Database Search Results Of Tandem Mass Spectrometry Data In Proteomics
3	Modeling Of Peptide Fragment Ion Intensities Based On Deep Learning
4	Research On Preprocessing Methods Of Tandem Mass Spectral Data
5	Studies On Several Key Issues Of Mass Spectrometry Data Processing In Proteomics
6	Pepfind: Protein Identification Algorithm-based Target-decoy Database Feature Information Matching For Tandem Mass Spectrometry
7	Kernel Based Learning Algorithm And Application
8	SELDI-TOF Protein Mass Spectrometry Data Analysis With Semi-supervised Learning
9	Laser Mass Spectrometry Is The Molecule Detection Of Atmospheric Pollutants
10	Research And Implementation Of Feature Extraction Methods For Mass Spectrometry And Nirs In Analysis Of Traditional Chinese Medicine