Font Size: a A A

Data Stream Outlier Detection Study Based On Time Series Analysis

Posted on:2017-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:T SunFull Text:PDF
GTID:2308330485485039Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In recent years, research on outlier in almost all fields of science exists extremely important application, the theoretical value of outlier detection is very important and in-depth study of the meaning and application in reality is also important. This paper studies the data stream outlier detection based on clustering and non-parametric Thiessen polygons, and air quality index AQI data, for example, detect data anomalies data, and study the correlation between AQI data and meteorological factors, as well to the AQI data in time series analysis, study the influence of outlier data generated in the short-term forecast. The main part is divided into two parts.First, the study found that the methods of outlier detection, through a period of time the data k-means algorithm to cluster, reducing the amount of data, and save the clustering center data, using nonparametric detection method Thiessen polygon method to detect outliers. And comparing with other detection algorithms, the proposed method results in terms of time or space complexity have been significantly improved.Then we study the correlation between air quality and meteorological factors AQI index in 2014 between major cities. First through correlation analysis, seek a higher correlation with the air quality index AQI factors, then analyzes the complex relationship between air quality index and meteorological factors, and establish of a linear regression model. And by association rules studied correlation between cities in case of more than moderate pollution.Finally, select the Chengdu 2015 index AQI air quality data, and using the clustering algorithm and Thiessen polygons detected outliers. And after the analysis of time series, remove the time series trend, then on the smooth of time sequence for model identification and parameter estimation, select ARIMA model to model, and the residual is stationary test to said adaptive model, finally use the model to predict. By comparing the predicted results, analyze the impact of outliers for prediction and analysis of relevant factors combined before analysis of the causes of abnormal data that may arise.
Keywords/Search Tags:Outlier detection, Time series analysis, Clustering Algorithm, Association Analysis
PDF Full Text Request
Related items