The Research Of Chinese Categorization Based On Parallel SVM Algorithm

Posted on:2019-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:X D Yin

Full Text:PDF

GTID:2428330548956885

Subject:Engineering

Abstract/Summary:

As a result of the development of computer science,lots of people began to use the Internet.We use various applications every day and produce lots of data.In the face of vast amounts of data,how to find the hidden values accurately and efficiently is particularly important.As an important part of data,the value of the text data is very rich,thus the text data is used for text classification commonly.The result of traditional classification algorithms is not accurate due to the large volume of data.In recent years,distributed technology has achieved high accuracy in the field of massive text classification.In this paper,I use the Hadoop as parallel computing design.I will introduce the development history,system composition and features of Hadoop firstly.Then I emphasize the HDFS and the MapReduce.For each key technology in the process of Chinese text classification,this article also gives a detailed description.The classification model training is the most critical step in the classification process.We analyze and study the related classification knowledge of SVM.This text combines Hadoop platform and SVM algorithm,propose an improved text classification model,the improved algorithm has two points to optimize: Firstly,in order to speed up the training process,in the training phase of the model,to the basic cascade support vector machine model Each layer on the layer is judged by iterative shutdown conditions,so that the training can be ended prematurely on the premise of meeting the accuracy;secondly,Aiming at the unsatisfactory classsification effect of SVM algorithm near the hyperplane,improve to SVM.That is classification stage of the model,selection of different classification methods according to location distribution of classification samples.In order to verify the effectiveness of the proposed method,we conducted an experimental verification.By designing related experiments,we compare the performance of stand-alone support vector machine,improved support vector machine,and improved parallel support vector machine in classification efficiency and classification accuracy.Through the analysis of the results,the improved algorithm has greatly improved the classification efficiency and performed well on the classification pseudo-group rate.Therefore,there is a great advantage in dealing with the classification problem of massive texts.

Keywords/Search Tags:

MapReduce, Hadoop, Chinese Categorization, SVM

Related items

1	An Implementation Of Text Categorization System Based On Hadoop
2	Research And Implementation Of Automatic Text Classification Based On Hadoop
3	On Bavesian Text Classification Learning Under Mapreduce Framework
4	Design And Implementation Of Text Classification System Based On Hadoop Platform
5	Research And Implement Of Chinese Multi-Selection Text Categorization System Based On Hadoop
6	Research And Implementation Of Chinese Text Classification Based On Hadoop And SVM Algorithm
7	Research On The Performance And Optimization Of MapReduce Model In Hadoop Platform
8	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
9	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
10	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform