Font Size: a A A

Research And Application Of KNN Classfication Algorithm Based On MapReduce

Posted on:2013-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:H X QiaoFull Text:PDF
GTID:2248330371478269Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growths of the Web and text informative, how to store information in reasonable has become a huge challenge for the computer industry.In order to solve the problem we most commonly used text classify as a way.Text classify is used to divide new document into a predefined class to help people search, query, filter, and make use of information. In a large number of automatic classification methods, KNN algorithm is the best classification algorithm the VSM (Vector Space Model), but the amount of calculation of KNN algorithm is particularly big. because on one hand, the KNN algorithm is a lazy classification algorithm, all these text classfication calculations which give rise to a big problem in the classification process; on the other hand, in the classification process, the new coming document which to be classify have to calculate the distances (or similarity) between the current text and all the known sample in order to obtain its K nearest neighbor. The two aspects restrict the development of KNN algorithm.In the subject that the research and relize the process of KNN classification based on the MapReduce technology, for the two aspects problems of the traditional KNN algorithm exits that cause calculation of excessive classification,less efficient. Take full advantage of the current concept of cloud computing in the distributed MapReduce programming model, the advantages of mass data processing, and re-describes the traditional KNN classification algorithm, mainly contains two processes which are pre-processing and classification, meanwhile we make use of MapReduce-specific data on the two processes computing processing, and ultimately significantly improve the classification efficiency of KNN classification algorithm. This paper introduces the KNN algorithm and MapReduce technology development, as well as facing the problems and will MapReduce technology with the KNN algorithm combining applied to text classification.In this thesis, a detailed description of MapReduce KNN algorithm, focusing on analysis of a sample document preprocessing, a new document the design process of computing the map function and the reduce function of the three core modules to the quantization process and the similarity analysis and experimental comparison KNN, the pretreatment rate of the improved FKNN algorithm, classification efficiency, and thus demonstrated the importance of MapReduce implementation of KNN-decomposition of the calculation process to speed up the classification efficiency and to maintain the advantages of KNN classification accuracy.
Keywords/Search Tags:Text Classfication, KNN algorithm, Text similarity, MapReduceArchitecture, Cluster
PDF Full Text Request
Related items