Multi-class Learning For Sequential Data

Posted on:2011-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:M Chen

Full Text:PDF

GTID:2178360305497807

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Pattern classification is an essential subject in machine learning, and is also a universal and widespread task in the computer application area. Support vector machine (SVM), a well-know classic pattern classification method, offers many advantages over traditional classifiers in that it overcomes many defects and limitation they exhibit by virtue of Vapnik-Chervonenkis dimension theory and structural risk minimization principle. The standard SVM has a good performance in solving two-class numerical classification problems. However, as the development of the information industry and internet technology, a large volume of sequential data is emerging in multitude in many application areas, with the classification problem presenting multiple-class form. Thus, multi-class learning for sequential data is attracting more and more attention of researchers from diverse fields, which may eventually become a hot topic in machine learning area.In this thesis, a new multiple data domain description model was established; it was theoretically founded on the new kernel-based multi-task learning formulation of learning multiple tasks simultaneously with the goal of capturing shared structures among tasks. The model was then naturally cast to the case of TFBS recognition with kernel methods, and the task of user navigation behavior mining. The major achievements of this thesis are summarized as follows:1. The background knowledge is briefly introduced, including the statistical learning theory, kernel methods and the basic idea of sequential data classification problems. Some key issue of sequential data classification is discussed, with the formal definition of sequence data given.2. A multiple data domain description model is established; it was theoretically founded on the kernel-based multi-task learning formulation. The model was then naturally cast to the case of TFBS recognition with kernel methods, and the task of user navigation behavior mining. Compared to the 0-1 polynomial kernel, our newly designed string kernel based on edit distance can effectively measure the similarity between sequences. With respect to the running time complexity, some related parallel algorithm is designed and implemented on GPU.3. We design and implement an Integrated Transcription Regulatory Platform (ITREP), which provides users with online browsing and mining services for transcription factor, transcription factor binding sites, and with their binding information. It also allows users to adjust the parameters to achieve the best result. On all accounts, ITREP provides biologists with an excellent bioinformatics tool for transcription regulatory research.

Keywords/Search Tags:

Machine Learning, Multi-class Classification, Sequence Data, Multiple Data Domain Description, Transcription Factor Binding Sites, Behavior Data, Bioinformatics

PDF Full Text Request

Related items

1	Applications Of Machine Learning Approaches To Biological Sequence Analysis
2	An Approach For Recognition Of Transcription Factor Binding Sites Based On Genetic Algorithm
3	Research And Implementation Of Transcription Regulatory Sequences Data Mining
4	Efficient Large-Scale Machine Learning Algorithms for Genomic Sequence
5	The Study Of Characterization And Prediction Of Binding Sites On Proteins Based On Machine Learning Methods
6	Analysis of machine learning algorithms on bioinformatics data of varying quality
7	Research On Medical Image Classification Method Based On Hypersphere Multi-class Support Vector Data Description
8	Research On Extreme Learning Machine For Imbalanced Data Classification
9	Multispectral Data Classification Based On Supprot Vector Machines
10	Study On Clustering Of Position Frequency Matrices For Transcription Factor Binding Site