Font Size: a A A

Research On The Text Classification Method Based On Extreme Machine Learning

Posted on:2019-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:H M PangFull Text:PDF
GTID:2428330593450171Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text Classification(TC)mainly through the document's title,content,keywords and other attribute information determine its category automatically and classify it into one or more categories.Text classification not only can help users to accurately locate their category information from massive text information,but also can largely replace manual classification management of text information,so the study of text classification has very important practical significance.Compared to other neural networks with a single hidden layer structure,the extreme learning machine only needs to set the input weight and the threshold of the hidden layer,and it obtains the unique optimal solution through the method of least squares.As a result,the algorithm of ELM has faster learning speed and good generalization ability.In recent years,extreme learning machine has been widely used in classification question,and this algorithm has also achieved some good classification results in text classification.However,extreme learning machine has some problems in the text classification process.First,the extreme learning machine cannot maintain the geometric structure of the original text feature when mapping text features.Second,extreme learning machine of single hidden layer structure does not have strong feature extraction ability when mapping text features.To solve these problem,we propose two studies as follow:(1)In the text classification process,the extreme learning machine randomly maps the input text features and presents a nonlinear geometric structure where the least square method cannot solve such nonlinear structures and affects the text classification performance.To solve this problem,this paper introduces a new manifold regularization idea into the extreme learning machine model,and presents a text classification algorithm based on manifold regularized extreme learning machine(MRELMT).This algorithm not only maintains the geometry of the input text feature,but also corrects the distance between the sample points by the text category information.The experimental results on Reuters and 20 newsgroup datasets show that this algorithm has a good classification performance compared with some other algorithms.(2)When the dimension of text data is high,the regularized extreme learning machine of single hidden layer structure has not enough ability to express feature in the text classification.To solve the problem,this paper presents a text classification algorithm based on multi-layer extreme learning machine(ML-ELM).Firstly,the algorithm employs the compressed representation of extreme learning machine based on auto-encoder(ELM-AE)to reduce the dimension in the text data.Then,it utilizes the structure of the multi-hidden to represent high-level features in the text data,and uses the least squares to classify the text data.The experimental results on Reuters,20 newsgroup and Fudan University Chinese Corpus datasets show that this algorithm has a good classification performance compared with some other algorithms.
Keywords/Search Tags:text classification, extreme learning machine, manifold regularization, multi-layer extreme learning machine, feature mapping
PDF Full Text Request
Related items