Font Size: a A A

The Method Choice Of Protein Purification Based On A Modified Naive Bayes

Posted on:2015-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z B LiFull Text:PDF
GTID:2298330467486766Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The proteins purification is an important research topic in the protein engineering, puri-fication methods mainly associate with protein properties, but not in the one to one corre-spondence. Now it mainly relies on testers to choose methods by protein properties according to the historical experience, and the whole process is complex and high in cost. In considera-tion of the large number of exsisting successful experiences, the classification of the data mining can be used to replace the traditional way for the method selection according to the correlation of methods and protein properties, realising the rapid choice.In the same classification accuracy, the naive bayes (NB) is more speedy, efficient and simple than other classification algorithms such as the neural network and K-means. In order to reduce the calculation complexity, it includes two assumptions, namely:all properties are independent; numerical continuous properties meet the normal distribution. The assumptions simplify the classification and achieve a good performance. But the accuracy of NB is mainly decided by the integrity and properties of samples. And in the protein purification, the ex-perience statement is different, the classification accuracy may decrease due to the data miss-ing in the process of getting samples and properties not meeting the two assumptions above. So it can not be realised to choose the purification plan rapidly by using the NB directly.As mentioned, a new bayesian classifier, EM-KDNB, is established in this paper. First, building bayesian networks for properties and introducing the EM into parameters study with incomplete data, setting up the temporary sample set by initializing parameters and weighting for each potential initial value, using the new sample to iterate and converge to the local op-timum for missing parameters filling; Then, based on kernel density estimation, using the dis-tribution density function and local data to evaluate maximum posterior probability to realise classification. Through the experimental verification, the new algorithm is effective in a cer-tain degree for missing data filling and more adaptive to continuous attributes which not meet the normal distribution, the accuracy is higher than that of the traditional algorithm.This paper applies the proposed algorithm to the method choice of protein purification, designs the complete flow, and develops an application system. It is applied effectively and raises the plan in line with experts。...
Keywords/Search Tags:Navie bayes, EM, Data missing, Continuous attributes, Kernel density es-timation, Protein purification
PDF Full Text Request
Related items