Font Size: a A A

The NcRNA Clustering Analysis With Unknow Cluster Numbers

Posted on:2015-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:G PengFull Text:PDF
GTID:2298330425986926Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In recent years, more and more researchers found that ncRNA plays a crucial role inlife. So far, ncRNA in the Rfam database has only2028families, and the number ofncRNA is gradually increasing, but there are a lot of ncRNA not being founded in theRfam database. Therefore, finding a new category for a new ncRNA sequences hasbecome a hot research topic. And the clustering effect of ncRNA sequence is closelyrelated to extract the accuracy of information from the ncRNA sequence. In this article, wepropose two methods to obtain the information of ncRNA sequences and there are λmatrix method and the composition ratio method for characteristing sequences,respectively. The distance between the sequence is converted into the sequence’scharacteristic vector of Euclidean distance, and then the question of distance between thesequence is simplified.In this paper, we use two kinds of clustering algorithm forclustering ncRNA that clustering number of ncRNA is not known. The first clusteringmethod that taking the most near heuristics rules is based on a λ matrix method andncRNA sequences can be divided into10types. The second method of ant colonyclustering algorithm is based on the composition method, finally ncRNA sequences can bedivided into23classes. In order to prevent losing the information of ncRNA sequences,we take the intersection of ncRNA class that those two method make. The consequence isClass D and F class. Class F is only a result of the sequence to be discarded. Finally, wepredict secondary structure of the D class sequence. we found that stem section of class Dlength distribution of the secondary structure are similar, then we can illustrate sequencesof the secondary structure in class D sequence has a certain similarity in a certain extent.
Keywords/Search Tags:NcRNA, Characteristic vector, Euclidean distance, Clustering, Secondarystructure
PDF Full Text Request
Related items