Font Size: a A A

Research On Anomaly Repair Technology In Time Series Data With Attribute Value Misplacement

Posted on:2022-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhaoFull Text:PDF
GTID:2518306572459954Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of today’s technology and the diversification of data generation methods,the scale of data possessed by mankind is becoming increasingly large.Massive data brings objective digital value,and at the same time brings more challenges in terms of data quality.Time series data is data generated with the rapid development of the Internet of Things.Time series data in the industry often has problems such as data abnormalities,data disorder,missing attribute values,and attribute value misplacement.Among them,there are relatively few researches on attribute value misplacement.,But this problem is very common in industrial scenarios,so it is of great significance to study this problem.Starting from the actual data flow scenario,this paper first proposes a universal method,and then proposes a more targeted method for the scenarios in which the attribute value misplacement occurs continuously and the attributes are correlated in the industry.The focus of research under the data stream environment is to detect and repair the misaligned attribute values in the data in time when the data is unlimited and the memory is limited,so as to improve the quality of time series data.The existing anomaly detection and repair methods usually modify the abnormal value to the calculated value,but the attribute value misplacement is usually that the value itself is not wrong,but the position is wrong.Therefore,this article hopes to repair by exchange instead of modification to get more real data.This paper first designs a general anomaly detection and repair method for attribute value misplacement data.This method first redefines the distance calculation method between tuples,and designs a distance-based anomaly detection method.The distance relationship between the set of historical tuples determines whether misplacement occurs.Based on the detection results,a method for repairing through iterative exchange is designed,which evaluates and ranks the contribution of each attribute to the distance,and preferentially exchanges the attributes that have a greater contribution to the overall distance.It is found through experiments that this method can repair most of the attribute value misplacement without modifying the original data value,and at the same time.The second part of this article designs an anomaly detection and repair method for continuous attribute value misplacement,that is,subsequence misplacement data.This method proposes the concept and calculation method of correlation coefficient between attributes,and designs a method based on correlation.The coefficient subsequence misplacement detection method judges whether the subsequence misplacement occurs by detecting whether the correlation coefficient between attributes changes.Based on the detection results,the misplaced subsequences are exchanged and repaired,and the repair results are evaluated based on the recalculated correlation coefficient.It is found through experiments that this method can effectively solve the problem of subsequence misplacement,and has higher efficiency than the general method in the first part.
Keywords/Search Tags:Time series data, data stream, attribute value misplacement, subsequence misplacement, correlation analysis
PDF Full Text Request
Related items