Font Size: a A A

A Streaming Algorithms For K-means With Approximate Coreset

Posted on:2020-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2428330623956325Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Clustering is the process which partitions a given set into subsets of characteristics.This problem has a high practical value because of its using widely and conveniently.There are a lot of different clustering techniques using different classifications,but the most popular one is k-means algorithm.The input to the classical Euclidean k-means problem are a set P of n points,and an integer k,and the goal is to find a set of k-centers that minimizes the sum of squared distances over each input point to its nearest center.In the streaming setting k-means problem,the points of P do not arrive together,but one by one,and there is not enough memory to store the whole data set.Therefore the concept of corsets has been introduced to solve this problem.One can compute a coreset,which is smaller and easier to keep in memory.We can compute the k-means clustering in the coreset instead of the original one,and the relative error of the result between them is arbitrarily small.But the k-means problem is NP-hard,there is no exact algorithm in polynomial time under the assumption of P?NP.Firstly we try to compute the coreset for a given weighted set as the result of k-means is replayed by the approximate one.We find that we cannot construct a real(k,?)-coreset satisfying its definition,but we can obtain an approximate coreset with a quite small error.Then we find that for every weighted set P=(P,?,?)in Rd,??>0,a constant?>0,and an integer k? 1,there is ?-approximation(k,?)-coreset C=(C,s,?)of size|C|=k0(1/?2).where each point in C is a linear combination of O(1/?2)points from P and?=(?)(?).Finally we generalize this result to the streaming and distributed big sparse data.
Keywords/Search Tags:Approximation algorithm, k-Means clustering, Coreset, Streaming
PDF Full Text Request
Related items