Font Size: a A A

The Research And Platform Development Of Protein Allergen Prediction Based On Ensemble Classify Algorithm

Posted on:2017-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330485958357Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years, the protein allergen prediction is considered as an important issue, which occupies a significant position in the field of immunology. In the traditional biomedical field, the main way to predict protein allergen is biological experiments or analysis of clinical cases. However, biological method is time consuming, costly, unable to meet the massive protein prediction. With the rapid development of biology sequencing, we can sequence the protein to obtain its sequence information. Under this background, we try to use computational methods to predict allergens. Firstly, we use computers to predict potential allergens, and then, verify the potential allergens by biological experiments. The method combines computational method and biological technic, speed up the allergen prediction a lot.A large number of computational methods have been used for protein allergen prediction. The International Food Bbggers Conference proposed a decision tree for estimating the allergenicity of genetically modified food. The specialists of the World Health Organization and the Food and Agriculture Organization improved the decision tree, and then proposed the sequence based method. The sequence of one protein is more similar to known allergens, the protein is more likely to be an allergen. Based on this idea, the motif based method was proposed and get better prediction results.Thinking about the problem of protein allergen prediction in the field of computer science, it is a typical bipartition classification model. Using numerical value to represent the biochemical features of the protein, and classify it by the machine learning method. Some features have been used and got good results, such as amino acid composition, dipeptide composition and E-descriptor. But rare study tried to explore new features in the amino acid index database. So we try to use principal components analysis to extract useful information from amino acid indices, and combine it with amino acid composition to be a new kind of feature descriptions. On the other hand, the classification methods, such as support vector machine and artificial neural network have been used in allergen prediction. But as we know, more people should do better judgment than one person in most cases. So we want to try ensemble classification method, and expect better predict results.In order to validate our idea, we did a comparative experiment. Three feature sets (amino acid composition, amino acid index and both of them) were used to be the feature descriptor of protein sequences. Three classification methods (support vector machine, AdaboostMl and LogitBoost) were used to classify the allergen and nonallergen. Each classification method applied to each feature set with ten fold cross test. The result showed that both amino acid composition and index contain more useful information than each of them, LogitBoost algorithm could predict better than support vector machine and AdaboostMl. The allergen prediction method based on ensemble classify is feasible and superior to previous methods.In addition, we also build a website for biology and medical researchers and staffs to search and predict allergen in an easy way.
Keywords/Search Tags:Allergen Prediction, Amino Acid Composition, Amino Acid Index, Ensemble Classification Method, Website
PDF Full Text Request
Related items