Font Size: a A A

Research On Algorithms For Naive Bayes Classification And Its Tools Based On Hadoop

Posted on:2014-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:K JiangFull Text:PDF
GTID:2308330482451979Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Naive Bayes learning method is a classical classification algorithm which is wide-ly used because it is simple, prompt to implement and produces a good performance. With the development of the IT technology, some new challenges are emerging.Nowadays TBs or even PBs data burst out every day but there are so few labeled instances for training to classify them. For these two problems this paper studies Naive Bayes al-gorithm, semi-supervised learning methods and MapReduce distributed programming model and does the following work.Firstly, this paper gives a brief introduction to the background and the current situation of cloud computing. We choose a popular parallelized system-Hadoop and its corresponding programming model-MapReduce as our system’s base, so this paper would give a more detailed introduction to them. We introduce the Hive project and the HBase project at the same time.Secondly,this paper introduces some text classification methods based on Naive Bayes.Then we combines a semi-supervised Naive Bayes Algorithm and the MapRe-duce programming model to propose a new algorithm called Parallelized Semi-supervised Naive Bayes Algorithm (PSNB) which could tackle with massive data while using the unlabeled instances to improve the performance of the classifier.Finally, for our users can take advantage of the Parallelized Semi-supervised Naive Bayes method and other data mining algorithms to handle the massive data more conveniently, this paper would introduce the design and the develop process of a tool box that focus on big data processing.
Keywords/Search Tags:Naive Bayes, Distribution Computing, Hadoop, MapReduce
PDF Full Text Request
Related items