Font Size: a A A

Research On Singular Sample Identification Method Based On MCCV Combined With T-test

Posted on:2019-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y CaiFull Text:PDF
GTID:2310330548952617Subject:Control engineering field
Abstract/Summary:PDF Full Text Request
Near-infrared spectroscopy is an indirect analytical technique.Its principle is to use a correlation between near-infrared spectroscopy and chemical values to establish a predictive model.The presence of singular samples can reduce the correlation between near-infrared spectroscopy and chemical values,which can cause the model's prediction performance to be severely disturbed or even destroyed.Therefore,the discrimination and processing of singular samples are particularly important and fundamental for establishing a robust model.There are many reasons for the bizarre samples,which may be due to abrupt changes in the overall conditions or the appearance of an unknown factor;it may be due to the measurement error in the data itself;or it may be due to the fact that the nature is completely different from the overall sample.When there are more exotic samples in the modeling sample,normal samples,strange samples that have bad influence on the model,and good samples that have strong influence on the model cannot be well distinguished.It is often the case that a good sample that has a strong influence on the model is mistakenly judged as a singular sample.So far,the identification and screening of singular samples is still one of the hot issues.In order to solve this problem,this paper analyzes the shortcomings of commonly used singular sample selection methods and improves the method of identifying singular samples based on Monte Carlo cross validation(MCCV),on the basis of the MCCV method,a calculation strategy incorporating a T-test is added.First,a large number of models are built by MCCV.Based on the cumulative frequency of samples in the small prediction residual squared(PRESS)model,the singular samples are initially identified.Next,the T-test is used to exclude the model prediction RMSEP value of the suspected singular sample and the RMSEP value of the model of the random sample.According to the T test result and RMSEP value,the suspected singular samples were screened to distinguish the normal sample,the strong influence good sample and the bad influence singular sample.To a certain extent,the possibility of removing normal samples is reduced,and the removal of abnormal samples is more accurate.The effectiveness of the improved algorithm was verified by theoretical analysis and experimental comparison(compared with the spectral residual ratio,Mahalanobis distance,mean variance of MCCV,etc.).It is proved that the singular sample identification method based on MCCV combined with T-test has better recognition effect on singular samples and higher recognition accuracy.This article analyzes the applicability of the singular sample selection method of MCCV combined with T test from two aspects of different sample size and different spectrum pretreatment.The experimental results show that the MCCV combined with T-test singular sample screening method is not affected by the number of samples and spectral pretreatment,and has good applicability.The robustness was verified from two aspects:the proportion of the calibration set in the MCCV and the number of RMSEP models in the T test that randomly excluded the sample.
Keywords/Search Tags:Chemometrics, Singular Sample Selection, Monte Carlo Cross Validation, T-test
PDF Full Text Request
Related items