Font Size: a A A

On-line Test For Two Kinds Of Data Stream

Posted on:2020-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y K QiuFull Text:PDF
GTID:2370330572497012Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Hypothesis test is a statistical method which is used to test statistical hypothesis.The basic idea of hypothesis test is rejecting the null hypothesis when a small probability event occurs.This method holds that small probability events are basically impossible to occur in a test.According to this method,a hypothesis about the overall sample is first made,and this hypothesis will hold in the case of large probability.If the test results deviate from the null hypothesis,that is,small probability events occur,then there is reason to doubt the authenticity of the null hypothesis and reject null hypothesis.The traditional hypothesis test is an overall test after data collection.With the development of large data technology and the rapid growth of data,a large number of streaming data have appeared in various fields which is a dynamic data set that grows infinitely with time.For most scenarios that generate dynamic new data continuously,we can't wait for the data to be collected before we test them as a whole.If we use the traditional hypothesis test,the problem of data hysteresis will arise.Online test is a real-time test method for data stream,which has strong timeliness.Therefore,on-line test is designed on the basis of traditional hypothesis test.In this context,the following two aspects of work have been done for hypothesis test:Firstly,on the basis of the traditional chi-square test and Hosmer-Lemeshow test,online chi-square test and online Hosmer-Lemwshow test are proposed,which are adapted from the chi-square test and Hosmer-Lemeshow test.In this paper,we use Lasso algorithm to select the most influential indicators for logistic regression based on the maternal data.Then we use the proposed method to test the fitted model.Compared to conducting the single traditional goodness-of-fit test performed at the end of data collection,the proposed method could detect changes of goodness-of-fit in the model fitting process during the study period,and the decision maker can decide whether to continue or stop the current model and take some measures to reduce the loss.Secondly,we proposed a new approach for the non-normal multivariate score test mon-itoring method(MSTM).On the one hand,The MSTM does not rely on the assumption that the distribution under control is normal,which is rarely true in practice.On the other hand,score test is the uniformly most powerful test.Compared with the traditional control chart in~2form,the power of score test for monitoring is higher.This paper uses Copula to get the joint distribution function of multivariate data,and then combines it with score test to construct test statistics.The simulation results show that the MSTM is more sensitive than Hotelling's~2control chart when monitoring the small and medium shifts.
Keywords/Search Tags:Logistic regression model, Goodness-of-fit test, Online chi-square test, online Hosmer-Lemeshow test, Hotelling's T~2 control chart, Multivariate score test monitoring(MSTM) method, Copula
PDF Full Text Request
Related items