Font Size: a A A

Research On Non-experimental Protein Date Mining With Active Learning

Posted on:2014-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:B DuFull Text:PDF
GTID:2248330398450290Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Protein’s function is closely related to its subcellular localization. In order to overcome the experimental data sparsity problem, using active learning method, an active sample selection method from the non-experimental proteins is advanced. This approach based on a evaluation function which can estimate the "values" of non-experimental samples to actively pick out the most valuable ones.Based on Swiss-Prot protein database, filter proteins in accordance with their items information. Then use PISCES to process protein sequences and do PseAA feature extraction. After these steps, Gram-positive bacteria, Gram-negative bacteria and plant datasets are constructed.With active learning method, based on the loss function and label probability, a non-experimental sample selection algorithm is constructed. Carry out three classification experiments on three data sets. The selected samples are incrementally added into the original experimental training set to help retrain the current classifiers and then test the classifiers. One hand, the experimental results show that the best result is better than the result without non-experimental samples and adding all non-experimental samples. So the proposed method can choose an appropriate number of non-experimental samples to improve prediction effect. On the other hand, the experimental results show that when the original training data sparsity problem is more serious, the prediction performance increases more, indicating the importance of non-experimental samples for improving the performance of classifiers. In conclusion, the proposed method can effectively select the most valuable samples and solve the experimental data sparsity problem of protein subcellular localization prediction.
Keywords/Search Tags:Active Learning, Subcellular Localization Prediction, Classifier, Non-experimental Data, Data Mining
PDF Full Text Request
Related items