Font Size: a A A

Distributed Anomaly Detection In Time Series

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WuFull Text:PDF
GTID:2428330590469351Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Anomaly detection is one of the hottest research topic in data mining area,and time series is the most common data in real world.As a result,anomaly detection in time series has been paid much attention by the scholars in the field of data mining,and it also has many applications in real world.The time complexity of the traditional time series anomaly detection algorithm is O(N~2),which is not efficiency enough.Researchers have proposed a lot of methods to speed up the algorithm to cope with the growing size of the data.However,the limitation of standalone machine according to computing and memory makes it useless when processing data in size of million.In this paper,we propose a time series anomaly detection algorithm based on distributed computing which mitigate these issues.We implemented this distributed anomaly detection algorithm based on Spark and HDFS for computing and storage respectively.After evaluating of some experiments,we found out that data in size of million,all the traditional accelerating algorithms failed,and the only worked algorithm which scans data into memory one by one,is far behind our distributed implementation.We achieve more than 5 times speed up by using 4 Spark nodes.The scalability of our distributed implementation guarantees the feasibility of anomaly detection in large scale data.
Keywords/Search Tags:anomaly detection, time series, distributed, big data
PDF Full Text Request
Related items