Font Size: a A A

K-nearest Neighbor Research Of Big Data Based On Yarn And Hash Technology

Posted on:2018-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZhangFull Text:PDF
GTID:2348330539485817Subject:Master of Engineering - Software Engineering
Abstract/Summary:PDF Full Text Request
In recent years,big data is one of the hot research topics in the field of machine learning,many challenges have been introduced into traditional machine learning due to the emergence of big data.K-Nearest Neighbor(K-NN)is a famous classification algorithm.Because the idea of K-NN is simple and it is easy to implement,K-NN has been widely applied to many fields,such as,face recognition,gene classification and decision making,etc.However,in the big data environment,the efficiency of K-NN is very low,even is not workable.In order to deal with this problem,based on Yarn and hash technology,this paper proposed two solutions: the first one employs Mapreduce and SimHash to classify big data by K-NN on cloud computing platform;the second use Spark and Sim Hash to classify big data by K-NN on cloud computing platform.The basic idea of two solutions are similar,including three steps:(1)we first transform the big data set from original space to Hamming space;(2)and then in Hamming space,based on cloud computing platform Yarn,we find training instances which are in same bucket with the testing instance x by big data computational frameworks Mapreduce and Spark;(3)finally the K exact nearest neighbors of x are found in the same bucket,and x is classified by the K exact nearest neighbors.The experimental results show that the proposed algorithm is effective and efficient.
Keywords/Search Tags:K-nearest neighbor, Yarn, hash technology, classification algorithms, big data sets
PDF Full Text Request
Related items