Font Size: a A A

Research On Text Classification Of Mixed-kernel Parallel Support Vector Machine Based On Hadoop

Posted on:2017-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:W NieFull Text:PDF
GTID:2348330533469368Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the contemporary,with the development of technology,the popularity of mobile intelligent devices,the convenience of all things networking,the amount of information grows exponentially,the era of big data has arrived.Among all sort of data types in big data era,there are a lot of text data.So,how to mine text data is very important.Traditional text classification methods based on artificial,have already been replaced by knowledge engineering and machine learning and statistical methods.Especially in recent decades,the method based on machine learning and statistical methods is more and more widely.At present,these text classification methods based on machine learning and statistical learning do not apply to deal with the big text data.Classifier training time is too long.So we use Hadoop platform to solve the problem.In this paper,we design large data text classification framework based on Hadoop.The main work is designing parallel text preprocessing,parallel feature dimensio n reduction,parallel feature word weight quantization,parallel classifier training.In many classification algorithms such as logistic regression,decision tree,support vector machine,neural network,KNN.This paper chooses the support vector machine based on VC dimension and structural risk minimization as the text classifier.SVM has the advantage of solving the dimension disaster,rarely over fitting and the good classification effect,but its computational complexity is high.In the face of a large number of sample data,the SVM classifier has the problem of long training time.In this paper,the existing parallel support vector machine based on Hadoop is studied in depth.Discusses the advantages and disadvantages of Cascading PSVM,Grouped PSVM,Feedback-PSVM.Put forward a new Feedback-PSVM.The validity of the Feedback-PSVM is verified by experiments.It reduce the training time of SVM algorithm and improve the accuracy of classification.The kernel function of SVM is studied in depth.Mixed kernel function based on Gauss kernel and polynomial kernel is putted forward.And the v alidity of the new kernel function classification is verified by experiments.
Keywords/Search Tags:text categorization, kernel function, support vector machine, hadoop
PDF Full Text Request
Related items