Font Size: a A A

Design And Application Of Kernels For Biological Sequences

Posted on:2008-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HuangFull Text:PDF
GTID:2178360218952805Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Pattern analysis is used to detect the pattern in a data set. It manifest influential function according to solve problems owned to Artificial Intelligence and Science of Computer. Pattern is internal relation of the data set. Also it may be rules or structure. We can make use of the patterns, which have been found to speculate on the new data which owned to the same data source.Kernel-based learning is the newest production of pattern analysis algorithms. It can embed the data into a new space, in which we can detect linear relation more easily than in the original space detecting non-linear relation. Kernel function is the shortcut, because it can access the characteristic of data in the high- dimension space without computing the non-linear mapping.With the fast development of human genome project, Bioinformatics play an important role in studies about pathogeny and function gene .Also it used to comprehend and detect the potential function of protein sequence. In this area, more research is to analyze the plenty of data to find out potential rules. Then we can presume or insure some new function with the rules'support.Of course, we can apply the kernel-based methods to classify biological sequences. What we will pay more attention to do is to design the kernel function, which considered as a shortcut. Majority of our researches can be organized to three parts:(1)We make an in-depth analysis about the frame of kernel-based methods. Maybe the actual tasks are different, but the methods work as the same way. The program are adjusted to accepted inner products of the input data. Then the kernels are used to compute inner products of the data which have been mapped into the characteristic space. So the algorithm is proved to be feasible in high-dimension space. This flow manifest the modularity of kernel methods;(2)After probing into the basic properties and construction of kernel function. we analyze the marginalized kernel which invented by K .Tsuda .On the basis of marginalized methods, a new reasonable way of designing a kernel, which using the distance between different characteristic vectors as the measure of similarity, is proposed in the kernel space.(3)Then the new kernel and the marginalized kernel are both used to classify bacterial gyrase subunit B amino acid sequences. Experimental results demonstrate that the new kernel embraces better recognition accurateness than the marginalized kernel. And it holds strong generalization capability, too.
Keywords/Search Tags:kernel design, marginalized kernels, kernel space, biological sequence classification, HMM, Euclidean distance
PDF Full Text Request
Related items