Research On Text Classification Algorithm Based On Map Reduce Model

Posted on:2014-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:G Y Yang

Full Text:PDF

GTID:2268330401962264

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the continuous expansion of the network size and the continuous increaseof information continues, text classification of centralized environment cannot meetthe existing needs, so large-scale data processing in a distributed environmentbecomes the focus of attention of the current IT industry. Large-scale data processingfor text classification is needed in advertising or in the field of information retrieval,so to study large-scale data text categorization research in cloud computingenvironment has become a focus. This article studies text classification algorithm andits incremental algorithm, premised on the text classification and based on theproposed inverted index tree structure, under the Hadoop platform.To sum up, the main research achievements, contributions and innovation can besummarized in the following points:1. This article proposes inverted index tree structure and parallels it on the cloudplatform, in order to improve the computing speed of the feature selectionmethods and to meet text classification algorithms such as KNN and Bayes, andto distribute sloppy according to the text vector latitude.2. Based on inverted index tree structure and text classification algorithm, this articleroposes massive data inverted index tree construction algorithm and its pruningstrategy, while presents incremental inverted index tree algorithm and its paralleldesign.3. Based on inverted index tree structure, this article designs the K-meansincremental classification algorithm, proposes the parallelization of the algorithmclassification under the Hadoop platform.4. Based on inverted index tree structure, this article proposes under cloudcomputing the Hadoop platform based inverted-index tree naive Bayes classifieralgorithm and three improved methods of the algorithm, respectively usingTFIDF the right weight weighted mutual information weighted expectedcross-entropy weighted Naive Bayesian text classification algorithm, while presents the Local Naive Bayesian text classification algorithm based oninverted-index tree.5. Based on doing experimental analysis by building Hadoop cluster, this articleverifies the inverted index tree structure and the classification accuracy, recall rateand classification performance of improved method of its text classification.

Keywords/Search Tags:

Inverted-Index tree, Naive Bayes, Text classification, IncrementalAlgorithm

PDF Full Text Request

Related items

1	Research On Text Classification Algorithm Based On Naive Bayes Method
2	Text Categorization Based On Naive Bayes Method
3	A Text Classifier About High Blood Pressure Based On Naive Bayes
4	Research On Text Classification Algorithms Based On Machine Learning
5	Research On Bayesian Networks-Based Text Classification Algorithms
6	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
7	The Study Of Naive Bayes Text Classification System Based On Artificial Intelligence
8	Text Classification Algorithm Research Based On Naive Bayes
9	Research On Spam Text Classification Based On Improved Naive Bayes Algorithm
10	Design And Implementation Of Text Classification System Based On K-neighborhood And Naive Bayesian