Font Size: a A A

The Research Of Chinese Categorization Based On Parallel SVM Algorithm

Posted on:2019-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:X D YinFull Text:PDF
GTID:2428330548956885Subject:Engineering
Abstract/Summary:PDF Full Text Request
As a result of the development of computer science,lots of people began to use the Internet.We use various applications every day and produce lots of data.In the face of vast amounts of data,how to find the hidden values accurately and efficiently is particularly important.As an important part of data,the value of the text data is very rich,thus the text data is used for text classification commonly.The result of traditional classification algorithms is not accurate due to the large volume of data.In recent years,distributed technology has achieved high accuracy in the field of massive text classification.In this paper,I use the Hadoop as parallel computing design.I will introduce the development history,system composition and features of Hadoop firstly.Then I emphasize the HDFS and the MapReduce.For each key technology in the process of Chinese text classification,this article also gives a detailed description.The classification model training is the most critical step in the classification process.We analyze and study the related classification knowledge of SVM.This text combines Hadoop platform and SVM algorithm,propose an improved text classification model,the improved algorithm has two points to optimize: Firstly,in order to speed up the training process,in the training phase of the model,to the basic cascade support vector machine model Each layer on the layer is judged by iterative shutdown conditions,so that the training can be ended prematurely on the premise of meeting the accuracy;secondly,Aiming at the unsatisfactory classsification effect of SVM algorithm near the hyperplane,improve to SVM.That is classification stage of the model,selection of different classification methods according to location distribution of classification samples.In order to verify the effectiveness of the proposed method,we conducted an experimental verification.By designing related experiments,we compare the performance of stand-alone support vector machine,improved support vector machine,and improved parallel support vector machine in classification efficiency and classification accuracy.Through the analysis of the results,the improved algorithm has greatly improved the classification efficiency and performed well on the classification pseudo-group rate.Therefore,there is a great advantage in dealing with the classification problem of massive texts.
Keywords/Search Tags:MapReduce, Hadoop, Chinese Categorization, SVM
PDF Full Text Request
Related items