Font Size: a A A

Research And Design Of KNN-join Algorithm Based On MapReduce

Posted on:2017-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:X Y SunFull Text:PDF
GTID:2358330485995686Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the continuous development of the Internet industry, ithas produced a lot of data, so how to get valuable knowledge in these large amounts of data has become the most important thing. In all of the data mining algorithms, kNN algorithm can be used to sort data, with the wide application of kNN algorithm, kNN-join algorithm would then be raised, itis widely used in all phases of data mining: stages of data preprocessing and data mining. However, with the increasing amount of data, as well as requirements for operational efficiency, traditional methods can not meet those requirements, resulting in a kNN-join operationbased on MapReduce.In this paper,we have a researchof all stage ofkNN-join operation based on MapReduce.First, about data preprocessing, improve the exsiting data partition algorithmto guarantee data uniform divided; second, to save the cost during the join process, makes the nearest k neighbors of each data in the same partition in one set, looking for seed set for every partition; last, to balanced resources utilization and algorithm accurate rate, we divide the data partitions into different groups.In this paper, we use real-world data and synthetic data combined to test the algorithm, to confirm the effectiveness of the algorithm.Experimental results show that our proposed algorithm is superior to the existing algorithms.
Keywords/Search Tags:MapReduce, kNN-join, Data Partition, Seed Set, Grouping
PDF Full Text Request
Related items