Font Size: a A A

Intelligent Agent-based Biological Information Retrieval System Design And Realization

Posted on:2010-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:X P LiuFull Text:PDF
GTID:2208360275983329Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Bioinformatics is a new comprehensive cross discipline involving biology, computer science and application mathematis. The number of documents on the the biological information database increase exponentially. How to organize, retrieval and process these great information become a big challenge. In the information area of biology,peptide and protein identification via tandem mass spectrometry and database retrieval is an important biological sequence retrieval problem.The Machine-learning is a new technology which makes computer simulate or achieve men's learning-action ,so it can automatically obtain new knowledge and skills,then improve oneself such as reorganizes the older frame of knowledge and improves the technique . The correlation between the Measurement of the Preservative Datas in the Database and the Query of the Users is one of the most important components in every Information Retrieval System. In this thesis, the author studied the Protein Sequence Identification and the Protein Homology Prediction from two aspects of retrieval function design by the Machine Learning Techniques.Due to the complexity of practical retrieval problems, there are usually more than one basic index of the correlation between the Measurement of the Preservative Datas in the Database and the Query of the Users, resulting in multiple-dimensional feature vectors. How to combine the multiple relevance indexes into a single one by the Machine-learning is the problem of retrieval function construction which the author studied in this thesis. In a word ,the author studied the definition in the relevant characteristic vector real-valued function so as to carry on a row of position to the retrieval result. The block structure of data is a unique feature of retrieval function learning problems. This thesis combining the protein homology prediction describes a series of approaches for more accurate learning of retrieval functions based on the block structure. These approaches range from the intra-block data normalization and block feature expansion methods for solving the non-i.i.d. (independent and identically distributed) problem, the block selection and support vector under-sampling methods for reducing redundant data, and the K-nearest-block ensemble method for designing query-adaptive retrieval functions.Through linking the Agent and Data Mining Technology ,the author designed personalized bioinformation retrieval system based on intelligentAgent. The users'interest characteristics can be vailable to the system by analysing WebPages, and desirable information can be obtained by the user from the system. Accordingly personalized information service is implemented. That is, the system not only can can filter the personalized information based on the likings of the user be implemented, but also can realize the cooperation filter through the information exchanges among Agents.
Keywords/Search Tags:information retrieval, bioinformatics, tandem mass spectrometry, machine learning, Agent
PDF Full Text Request
Related items