Font Size: a A A

Multi-class Learning For Sequential Data

Posted on:2011-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:M ChenFull Text:PDF
GTID:2178360305497807Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Pattern classification is an essential subject in machine learning, and is also a universal and widespread task in the computer application area. Support vector machine (SVM), a well-know classic pattern classification method, offers many advantages over traditional classifiers in that it overcomes many defects and limitation they exhibit by virtue of Vapnik-Chervonenkis dimension theory and structural risk minimization principle. The standard SVM has a good performance in solving two-class numerical classification problems. However, as the development of the information industry and internet technology, a large volume of sequential data is emerging in multitude in many application areas, with the classification problem presenting multiple-class form. Thus, multi-class learning for sequential data is attracting more and more attention of researchers from diverse fields, which may eventually become a hot topic in machine learning area.In this thesis, a new multiple data domain description model was established; it was theoretically founded on the new kernel-based multi-task learning formulation of learning multiple tasks simultaneously with the goal of capturing shared structures among tasks. The model was then naturally cast to the case of TFBS recognition with kernel methods, and the task of user navigation behavior mining. The major achievements of this thesis are summarized as follows:1. The background knowledge is briefly introduced, including the statistical learning theory, kernel methods and the basic idea of sequential data classification problems. Some key issue of sequential data classification is discussed, with the formal definition of sequence data given.2. A multiple data domain description model is established; it was theoretically founded on the kernel-based multi-task learning formulation. The model was then naturally cast to the case of TFBS recognition with kernel methods, and the task of user navigation behavior mining. Compared to the 0-1 polynomial kernel, our newly designed string kernel based on edit distance can effectively measure the similarity between sequences. With respect to the running time complexity, some related parallel algorithm is designed and implemented on GPU.3. We design and implement an Integrated Transcription Regulatory Platform (ITREP), which provides users with online browsing and mining services for transcription factor, transcription factor binding sites, and with their binding information. It also allows users to adjust the parameters to achieve the best result. On all accounts, ITREP provides biologists with an excellent bioinformatics tool for transcription regulatory research.
Keywords/Search Tags:Machine Learning, Multi-class Classification, Sequence Data, Multiple Data Domain Description, Transcription Factor Binding Sites, Behavior Data, Bioinformatics
PDF Full Text Request
Related items