Research And Implement On Data Mining Algorithm Parallel Based On Hadoop

Posted on:2015-08-18

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Wu

Full Text:PDF

GTID:2298330467963265

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the popularity of Internet technology and cloud computing technology, Internet companies provided network services need to deal with the daily generation and exploding data. Massive data has gradually surrounded us. Growing data to bring a great value, but also to bring a huge challenge. How to analyze and mine the valuable information hidden behind data has become the focus of many large companies are concerned.Automated processing of large-scale document information resources is a massive data processing in the field of more concern, companies are classified by text data, not only can effectively organize digital resources, and to ensure that digital resources are fully searchable and fully utilized to meet user demand for consulting services. But while the text data and Internet companies have generated massive, complex features, and now face the rapid growth of text data, using traditional methods to deal with alone has gradually failed to meet people’s needs, how efficient sort of massive text and dig out valuable information, which is a concern of this article.Hadoop is the most popular open-source framework for distributed processing of massive data. The main components include Hadoop HDFS and MapReduce. HDFS is a distributed file system provides a Hadoop cluster, and MapReduce is a distributed framework, through a combination of both, massive text data can be effectively treated.This paper studies the procedures and principles of Hadoop distributed processing, and on its basis to achieve a text classification process in various parts of parallelization, and Bayesian algorithms to calculate the conditional probability and the prior probability of statistical methods and the adoption of parallel samples were grouped training methods implemented based on Naive Bayes and support vector machine classification algorithm for distributed two text classification system, by comparing the results with the stand-alone system, demonstrate the efficiency of the system during the Hadoop to text classification higher than the stand-alone, and achieved good classification results.

Keywords/Search Tags:

Text Classification, Hadoop, Naive Bayes, SVM

PDF Full Text Request

Related items

1	Research On Algorithms For Naive Bayes Classification And Its Tools Based On Hadoop
2	Research On Text Classification Algorithm Based On Naive Bayes Method
3	Research And Implement On Data Mining Algorithm Parallel Based On Hadoop
4	A Text Classifier About High Blood Pressure Based On Naive Bayes
5	Research And Application On Naive Bayes Classification Algorithm
6	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
7	The Study Of Naive Bayes Text Classification System Based On Artificial Intelligence
8	Text Classification Algorithm Research Based On Naive Bayes
9	Research On Spam Text Classification Based On Improved Naive Bayes Algorithm
10	The Research And Application Of Text Classification Based On Cloud Computing