Font Size: a A A

Research Of The EST Clustering Based On Hidden Markov Model

Posted on:2009-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:L X DuanFull Text:PDF
GTID:2178360272490695Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
There exist many characteristics in the original EST sequence, for example the enzymel(EcoRI),enzyme2(XhoI),adaptorl(P),adaptorl(D),polyA,polyT and so on. According to the EST sequence characteristics, when classifying the EST sequence, we usually have known the categories of certain EST sequence, and make the known EST sequence as the marked sequence, the left sequence in the gather as the non-marked sample. This kind of classification often is based upon a given premise: the marked sample is the whole sample. However, the actual situation is not that way. The classification according to the inadequate original EST sequence often results in the class deficiency, and appears the wrong results. Moreover, it takes both time and effort to do it.Focusing on that kind of question, this paper presents a clustering method, which combines the K-means algorithm and the Hidden Markov Model algorithm. The aim is to group the EST sequence data into clusters, and those clusters have the similarities. This method overcomes the limits of the two algorithms, and gives full play to them.Firstly, the paper deal with the EST sequence data in a certain way; secondly, we cluster the processed data by K-means algortihm to get clusters; then, apply the Baum-Welch algorithm to train the sequences that belong to every cluster, the results are the Hidden Markov Model parameters of every cluster; later, the paper can use the probability models to test the data, or cluster the new data, so can reach the aim of the model evaluation or self-cluster.The research of this paper is the important topics of the biological sequence analysis, the researched content is based on the experiment, the method presented is verified to have certain validity and significance by the experiment, lays the foundation for the further sequence analysis and research.
Keywords/Search Tags:EST Sequence, Clustering, Hidden Markov Model
PDF Full Text Request
Related items