K-nearest Neighbor Research Of Big Data Based On Yarn And Hash Technology

Posted on:2018-12-05

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Zhang

Full Text:PDF

GTID:2348330539485817

Subject:Master of Engineering - Software Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,big data is one of the hot research topics in the field of machine learning,many challenges have been introduced into traditional machine learning due to the emergence of big data.K-Nearest Neighbor(K-NN)is a famous classification algorithm.Because the idea of K-NN is simple and it is easy to implement,K-NN has been widely applied to many fields,such as,face recognition,gene classification and decision making,etc.However,in the big data environment,the efficiency of K-NN is very low,even is not workable.In order to deal with this problem,based on Yarn and hash technology,this paper proposed two solutions: the first one employs Mapreduce and SimHash to classify big data by K-NN on cloud computing platform;the second use Spark and Sim Hash to classify big data by K-NN on cloud computing platform.The basic idea of two solutions are similar,including three steps:(1)we first transform the big data set from original space to Hamming space;(2)and then in Hamming space,based on cloud computing platform Yarn,we find training instances which are in same bucket with the testing instance x by big data computational frameworks Mapreduce and Spark;(3)finally the K exact nearest neighbors of x are found in the same bucket,and x is classified by the K exact nearest neighbors.The experimental results show that the proposed algorithm is effective and efficient.

Keywords/Search Tags:

K-nearest neighbor, Yarn, hash technology, classification algorithms, big data sets

PDF Full Text Request

Related items

1	Study On Generalized Nearest Neighbor Pattern Classification
2	Research On Improved K-nearest Neighbor Method For Imbalanced Data Set Classification
3	Research On K Nearest Neighbor In High Dimension Data
4	Research Of Local Sensitive Hash Index Based On Nearest Neighbor Graph
5	Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters
6	Research On Several Pattern Classification Methods Based On K-nearest Neighbor Criterion
7	Multiple Hash Tables Indexing And Optimization For Approximate Nearest Neighbor Search
8	Evolutionary Extreme Learning Machine Based Feature Weighted Nearest Neighbor Classification Algorithm
9	Classification Of Uncertain Data Based On Nearest Neighbor
10	Nearest Neighbor Classification Improved Algorithm