Font Size: a A A

The Study On Receiver Operating Characteristic Curve And Change Point Detection

Posted on:2022-03-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y G WangFull Text:PDF
GTID:1488306317494154Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Change point detection is to detect the time when the data distribution or trend changes in the random process.It originated from the quality control in industrial production after World War ?.After more than 70 years of development,with the advent of comprehensive information age,big data has been paid more and more attention today.As an important data analysis method and tool,change point detection has made great progress in theory and application practice.It is widely used in economy and finance,medical diagnosis,environment and climate,signal detection,genome data analysis and pattern recognition.Receiver operating characteristic curve(ROC curve)derived from signal detection is a comprehensive index used to describe the variation of signal detection specificity and sensitivity.Due to its good statistical characteristics and clear mathematical meaning,ROC curve is widely used in medical diagnosis,bioinformatics analysis,pattern recognition and other fields.Based on the good performance of ROC curve,this thesis aims to propose some methods to solve the problem of change point detection through the statistical characteristics of area under ROC curve(AUC),and apply them to real data such as genome.Up to now,the problem of change point detection has been widely studied,however,in the face of the application scenarios in the era of massive information,it still faces the following challenges:1)in the face of complex data types,various forms of distribution of data,too many parameter assumptions are becoming more and more unrealistic,how can we propose a free distribution algorithm that fully adapts to all distributions;2)in real application scenarios,the number of change points is always unknown,because the This practical algorithm must be able to accurately give the number of change points.3)On the premise of ensuring 1)and 2),how to improve the accuracy of change point,for off-line change point detection,the estimated position of change point is as close to the real position as possible;F or on-line change point detection,the delay is as low as possible.In essence,these three problems reflect the robustness of change point detection algorithm from three aspects.Firstly,this thesis introduces the definition of ROC curve from the binary classification problem,and summarizes the basic method of ROC construction,then introduces the ROC curve analysis method,and introduces the concept of AUC,and introduces the basic statistical characteristics of AUC in detail.Then,in order to improve the robustness of change point detection algorithm,the following solutions are proposed:1)For real application scenarios,the basic model of mean change point detection is completed.That is,no matter offline or online change point detection,the specific distribution of the data model is not assumed.In the off-line scenario,the number of change points is not assumed.2)Using the asymptotic distribution of AUC statistics,an off-line multi change point detection algorithm based on double sliding windows is proposed.In the sliding process,the AUC statistics are calculated by using the data in the two windows at the same time.Then,the threshold is set according to the characteristics of the AUC asymptotic normal distribution,and each point is tested and judged.The extreme value of the local interval formed by the data points that exceed the threshold is taken as the initial location of the change point Estimate and count the data length k of each interval.In order to reduce false alarm errors,we propose to use the difference of K value distribution under zero hypothesis and alternative hypothesis.and set a threshold value for K value.If the interval length of the change point is lower than the threshold value.the change point will be deleted and will not be used as a change point.The comparative study shows that when the noise obeys the abnormal distribution,our algorithm wants to have an advantage over the comparison algorithm.Finally,the algorithm is applied to real genome data to verify the validity of the algorithm's application value.3)In this thesis,the autocorrelation function of AUC statistics is derived under the double sliding window search method,and the stationarity of AUC sequence under null hypothesis is proved.Based on this,the generalized extreme value distribution of AUC sequence is given,and the setting threshold of change point detection method by using extreme value distribution is given.In the stage of reducing false alarm,we use the automatic removal strategy to improve the robustness of the algorithm.The experimental results show that the method has obvious advantages over the method using AUC statistics only or the Exponentially Weighted Moving-Average AUC statistics.At the same time,it is better than the method of filtering out false alarm change points with K threshold in Chapter 3.Finally,the algorithm is applied to real genome data,which shows that the algorithm has a certain application value.4)An online change point detection method using multiple reference windows and a sliding window to calculate AUC statistics is proposed.The statistical properties,asymptotic distribution and autocorrelation function of the AUC statistics are derived.Based on these statistical properties,the specific expression of AUC extreme value distribution is derived,and the theoretical expression of on-line change point detection threshold is given.Compared with the latest kernel method,our method has some advantages in the mean value online change point detection.
Keywords/Search Tags:offline change point detection, ROC curve, AUC, online change point detection, extreme value distribution
PDF Full Text Request
Related items