Font Size: a A A

Research On Text Classification Methods Based On Extreme Learning Machine

Posted on:2019-06-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:M LiFull Text:PDF
GTID:1368330545963794Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Extreme learning machine has attracted more and more researchers' attention as a learning efficient single hidden layer feedforward neural network learning algorithm.Traditional neural network learning algorithms such as BP need a large number of complex parameter adjustments,and it is easy to produce overfitting and even cannot produce optimal solutions.SVM has its theoretical and practical advantages,but it only applies to binary classification problems,and it is not suitable for current big data for text classification and management.Extreme learning machine parameters are simple and do not need to be set manually.Setting the input weight the hidden layer in the model training produces the optimal solution of the model.ELM has the characteristics of fast learning and good generalization performance.It shows great potential in large-scale sample learning and real-time processing.Based on the research work of the predecessors,this dissertation focuses on some theoretical and practical problems of text classification and extreme learning machines.The main work of the thesis includes the following aspects:First,we propose a novel hybrid adaptive fuzzy extreme learning machine model,because of the traditional ELM model overfitting,when there are too many outliers in the training sample.We designed a fuzzy membership function based on distance and density.In the traditional fuzzy density membership function,the density factor is calculated from the density between the sample and its nearest k samples.This method cannot reflect the actual distribution of the samples.We use the clustering algorithm based on the quantum harmonic model to unsupervisedly obtain the class to which each sample belongs and the density between the sample and other samples in that class so that the density membership function can be generated by the sample distribution adaptively.FELM model based on distance and density fuzzy membership is generated by adding the distance membership and the density membership linearly,and applied to the text classification.Experimental results show that the model suppresses overfitting and improves classification performance in text classification.Second,we explore an integrated strategy based on text data sets and text feature diversity,according to the problem of unbalanced data distribution and unstable performance of single extreme learning machine.We propose the text classification algorithm DV-RELM based on fusion samples and features.First,the training samples are partitioned into N parts by random sampling algorithm of lifting the small class samples,and the small class samples are superimposed on the N sample subsets to generate N subsets of training samples,then the training samples subsets already select and synthesize text features randomly to generate new text vectors for each of the N training subsets.Then,the samples that have been divided and their new composite text vectors are input into the sub-classifier for N iterations.Finally,a majority vote is used for the sub-classifiers ensemble.DV-RELM algorithm alleviates the overfitting problem very well,due to the full use of the diversity principle and ensemble strategy.The algorithm has achieved the best classification performance the optimal calculation time on unbalanced text datasets of various data sets sizes.Third,We present an EL-SPPMI semantic representation model based on a co-occurrence matrix,according to the text feature are high dimensionality and sparse because of the traditional VSM text representation method,which brings a great computational burden to extreme learning machine.SPPMI semantic representation model has better text feature representation and semantic combination ability than the word embedding model.We propose a probability increase of low frequency words algorithm based on similar pairs,because the requirement of feature fusion cannot be fulfilled only by word vector which is generated by the word embedding in extreme learning and in order to further improve the text feature representation ability of SPPMI model.The classification performance of EL-SPPMI semantic representation model has been greatly improved on the standard text datasets compared to the word embedding representation model.Fourth,we present a cost-sensitive integrated WELM algorithm and apply it to text classification.The algorithm achieves very good results on three standard text datasets.WELM assigns different weights to different samples to improve the accuracy of text classification on unbalanced datasets to a certain extent,but the WELM algorithm ignores the differences between different samples in the same category.We use the category information entropy to generate the cost-sensitive matrix and the cost-sensitive factor,and measure the importance of different documents through the category information entropy,them integrate the cost-sensitive WELM into the AdaBoost.Ml framework seamlessly.Traditional text representation of VSM produces highly dimensional and sparse features,which impose a serious burden on ELM calculation.We develop a text classification framework that combines word embedding and cost-sensitive integrated WELM to overcome this problem.Experimental results show that this method is an accurate,reliable and efficient solution for text classification,and can solve the unbalanced classification problem in text classification effectively.
Keywords/Search Tags:Extreme Learning Machine, Semantic Representation, Fuzzy Membership, Feature Selection, Cost Sensitive, Ensemble Learning, Text Classification
PDF Full Text Request
Related items