Font Size: a A A

Research On Text Categorization Based On Tensor Space Model

Posted on:2011-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:W HeFull Text:PDF
GTID:2178360308973010Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
It encounters the challenge raised by the rapid development of Internet and information technology, which is difficult to train the classifier because of the limit sample compared to huge unknown data, and to simulate the space distribution of samples. This may raise the problems of over fitting and unbalanced data. Tensor representation is helpful to reduce the model complexity via reduce the number of unknown parameters used to represent a learning model. And it helps reduce the small sample size problem efficiently to enhance the generalization of learning model. The primary step to implement the supervised tensor learning (STL) framework on the vector data is to reconstruct the data based on tensor. This dissertation studies text categorization based on tensor space model. The main contributions of this dissertation are as follows:(1) We research the basic theory of STM through analyze the learning procedure, and based on those researches, we analyze and compare several TSM. At last we explain the limitation of these methods by both theory and experiments.(2) Two novel TSMs, random mapping TSM (RM_TSM) and small categories random mapping TSM (SRM_TSM). The experiment shows that RM_TSM and SRM_TSM perform better than other TSMs.(3) We join the STM to the multi-class classification method, in which choose different TSM and dimension according to the level of sparseness and unbalanced of samples. The experiment shows that this method can improve the accuracy of the classifier.
Keywords/Search Tags:TSM, Text Classification, STM, multi-class classification
PDF Full Text Request
Related items