The Method Choice Of Protein Purification Based On A Modified Naive Bayes

Posted on:2015-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:Z B Li

Full Text:PDF

GTID:2298330467486766

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

The proteins purification is an important research topic in the protein engineering, puri-fication methods mainly associate with protein properties, but not in the one to one corre-spondence. Now it mainly relies on testers to choose methods by protein properties according to the historical experience, and the whole process is complex and high in cost. In considera-tion of the large number of exsisting successful experiences, the classification of the data mining can be used to replace the traditional way for the method selection according to the correlation of methods and protein properties, realising the rapid choice.In the same classification accuracy, the naive bayes (NB) is more speedy, efficient and simple than other classification algorithms such as the neural network and K-means. In order to reduce the calculation complexity, it includes two assumptions, namely:all properties are independent; numerical continuous properties meet the normal distribution. The assumptions simplify the classification and achieve a good performance. But the accuracy of NB is mainly decided by the integrity and properties of samples. And in the protein purification, the ex-perience statement is different, the classification accuracy may decrease due to the data miss-ing in the process of getting samples and properties not meeting the two assumptions above. So it can not be realised to choose the purification plan rapidly by using the NB directly.As mentioned, a new bayesian classifier, EM-KDNB, is established in this paper. First, building bayesian networks for properties and introducing the EM into parameters study with incomplete data, setting up the temporary sample set by initializing parameters and weighting for each potential initial value, using the new sample to iterate and converge to the local op-timum for missing parameters filling; Then, based on kernel density estimation, using the dis-tribution density function and local data to evaluate maximum posterior probability to realise classification. Through the experimental verification, the new algorithm is effective in a cer-tain degree for missing data filling and more adaptive to continuous attributes which not meet the normal distribution, the accuracy is higher than that of the traditional algorithm.This paper applies the proposed algorithm to the method choice of protein purification, designs the complete flow, and develops an application system. It is applied effectively and raises the plan in line with experts。...

Keywords/Search Tags:

Navie bayes, EM, Data missing, Continuous attributes, Kernel density es-timation, Protein purification

PDF Full Text Request

Related items

1	Research Of Protein Purification System Based On Data Mining
2	Research On Bayesian Classification Based On Continuous Attributes And Its Application
3	Kernel Density Estimation On Correlated Naive Bayes Network Traffic Classification
4	The Research On Classifier Ensemble Learning For Data Mining
5	Protein-Protein Interaction Extraction Based On Ensemble Kernel
6	On The Particle Filter Algorithm And Its Circuit Implementation
7	A Study On SVM Algorithm For Missing Data Processing
8	Research On Hybrid Classification Based On Navie Bayes And Decision Tree
9	Research On Strategy Of Imputing Missing Data Based On Random Forest
10	An Algorithm For Discretization Of Continuous Attributes Based On NBC Clustering In Rough Set Theory