Font Size: a A A

Chinese Text Categorization Based On Support Tensor Machine

Posted on:2017-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:J YuFull Text:PDF
GTID:2428330590468337Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Text categorization based on machine learning methods is an important research direction in the field of natural language processing.It gives people quick access to useful information with considerable accuracy.Currently,most machine learning algorithms use vector to represent data.In comparison,representing data using tensor form retains represent more information about the internal structure of multi-modal data.It shows stronger generalization ability when dealing with high-dimensional small sample data.Thus,machine learning algorithms based on tensor data is getting increasing attention from researchers in recent years.This paper focuses on two main aspects: support tensor machine and text categorization.The optimization of support tensor machine and the structure of tensor space model of text is deeply discussed.In general,this paper can be divided into the following sections:1.The use of rank-r support tensor machine is proposed.The model is based on the advantages and limitations of rank-1 support tensor machine and SVM.In order to deal with different training sets,we suggest the use of rank restraint to control the number of parameters of the learning model.To deal with different definitions of tensor rank,namely CP rank and Tucker rank,the equivalent models of rank-r support tensor machine and solutions of the corresponding optimization problem are discussed.2.A feasible solution to estimate the optimal rank constraint of rank-r support tensor machine is presented based on the analysis of the essence of the alternating projection algorithm in the solution of rank-r support tensor machine.3.We suggest to alternatively fill the columns of the feature tensor in descending order of the positive and negative correlation features in order to construct the tensor space model of text.The idea is based on the analysis on the distribution of feature weight in text categorization,3.A set of feasible Chinese text categorization system is designed.This system combines current research results and the common text classification technologies.It begins with preprocessing,feature selection,and the vector representation of text.Then,through the initial training of SVM,the optimized tensor space model and approximate optimal rank constraints can be solved.Finally,the tensor representation of text is obtained.We can enter the training set into rank-r support tensor machine to get the classification model,and use the testing set to get an evaluation.Results of this paper can be used for the topic classification of text.As the high scalability of rank-r support tensor machine and the optimized tensor space model,this paper also has a considerable reference value for other machine learning fields such as face recognition.
Keywords/Search Tags:text categorization, support tensor machine, rank-r supporttensor machine, tensor space model
PDF Full Text Request
Related items