| In recent decades,with the rapid development of the economy,all walks of life have generated and collected a large amount of data,in which time series data play an important role.Time series data usually is high dimension and has a large data volume,especially industrial and manufacturing data.Industrial and manufacturing data has the characteristics of large volume,multi-source,continuous sampling,low-value density,and dynamics,which brings difficulties and challenges to data cleaning.In the industrial data collection platform,various sensors are usually grouped and working together.The data collected by sensors in the same group usually have a similar pattern,and the data collected by sensors in different groups may have a physical correlation relationship.This correlation can be utilized to improve the effectiveness and efficiency of time series cleaning.Research on constraint-based time series cleaning are carried out in recent years and it achieves a better result.However,Existing methods focus on cleaning one sequence separately and fail to address and utilize the correlation between multivariate time series.To handle this,this thesis conduct research on correlation-based high-dimensional time series cleaning.The main research contents are as follows:(1)For constraint-based time series anomaly detection,this thesis discusses the classification of constraints and enumerates some basic constraints.This thesis proposes an anomaly detection in sequence under speed constraint,devises a naive dynamic programming algorithm running in O(n2),transforms the problem to a 2D-range-query problem and employ a 2D-range tree to optimize the algorithm to time complexity O(n log2n).(2)For the space-correlated time series,this thesis proposes an effective and efficient anomaly detection framework,which including three phases: PAA,correlation evaluation,and anomaly detection.This framework first determines suspect sequences,then proceeds with an anomaly detection on those suspect time series.This framework can combine a variety of anomaly detection or data cleaning algorithms and has good expandability.Experiments on real IIo T datasets demonstrate both efficiency and effectiveness of the proposed framework.(3)For the physics-correlated time series,this thesis proposes a physics constraint to address the physical relation in time series and proposes a cleaning framework.This framework first detects constraint violations,then calculates the distance between constraint violation character and anomaly performance provided by a priori knowledge.By transforming the tracing anomaly cause problem to a weighted set cover problem,this framework executes a heuristic local-search algorithm and finally traces and determines the cause of the abnormality. |