Font Size: a A A

Research On Multi-classification Method Of Blood Thalassemia Genotype Based On Vis-NIR Spectroscopy

Posted on:2022-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:X W ShiFull Text:PDF
GTID:2504306734465894Subject:Optical Engineering
Abstract/Summary:PDF Full Text Request
Thalassemia(referred to as "Thalassemia")is one of the most prevalent and serious genetic diseases in the world.The provinces south of the Yangtze River in my country are areas with a high incidence of thalassemia.The carrying rates of pathogenic genes in Guangxi and Guangdong populations are as high as 24.13%and 11.07%,respectively.According to different types of insufficient number of globin chains,thalassaemia is divided into α-thalassaemia andβ-thalassaemia.α-Thalassaemia is further divided into two types:standard α-thalassaemia(referred to as "standard α")and static α-thalassaemia(referred to as "static α").Thalassemia can lead to hemolytic anemia,which can cause disability or death in severe cases.Except for hematopoietic stem cell transplantation,there is currently no cure for the disease.In areas with a high incidence of thalassaemia,screening large populations of thalassaemia carriers is currently the main measure of thalassaemia prevention.Multi-stage and multi-category screening including thalassaemia-positive screening and thalassaemia genotyping are required.The existing genotype diagnosis of thalassaemia carriers usually requires a series of whole blood cell analysis,hemoglobin component analysis,and DNA analysis-based thalassaemia genotyping,which requires a variety of instruments and reagents,which is complicated and time-consuming.In recent years,many studies have used visible-near-infrared(Vis-NIR)spectroscopy of blood for rapid quantitative analysis of thalassaemia screening indicators without reagents.However,the work of qualitative discriminant analysis of thalassaemia is still rare;especially the further genotyping has not yet been reported.In this paper,Vis-NIR spectroscopy combined with a new chemometric method was used to establish a "four-category" spectral discriminant analysis model based on human peripheral blood thalassaemia mild β,standard α,static α and normal controls.Exploring the establishment of a simple and convenient new technique for genotyping thalassaemia carriers by directly measuring blood spectra without reagents has important research value and application prospects.Based on Vis-NIR spectroscopy and partial least square discriminant analysis(PLS-DA)multi-stage "two-classification" spectral pattern recognition method,a "four-classification"strategy and method for thalassaemia genotyping is proposed.Establish three phases of thalassaemia positive(including standard α,static α,mild β)-normal control,α thalassaemia(including standard α,static β)-β thalassemia(mild β),and standard α-static α.Classification"discriminant analysis model.Further establish the "four classification" discriminant analysis model of mild β-standard α-static α-normal control.Through the sample experimental design of calibration-prediction-validation,it is proposed to adopt 9 recognition accuracy rates(RAR/%)and their standard deviations(RARSD)as the evaluation indicators of the model.The total accuracy rate(RARTotal)is used as the optimization index for wavelength selection,and RARSD is used as the stability index of the model.Carry out research on wavelength model optimization methods,including moving window(MW),equidistant combination(EC),wavelength step-by-step phase-out(WSP)and integrated optimization research.As a comparison between K-correlation coefficient and K-nearest neighbor(KNN)method model was also established.The main research contents and results are as follows:(1)Thalassaemia "two classification" discriminant analysis model based on PLS-DA and wavelength optimization:SNV is used for spectral preprocessing,EC-PLS-DA method is used for preliminary screening of large-scale wavelength models,and WSP-PLS-DA method performs the second wavelength optimization on the top 10 model obtained by EC,and determines the optimal EC-WSP-PLS-DA model of the "two-classification" model at each stage according to the evaluation index.N=48,LV=18 for the optimal thalassaemia positive-normal control"two-classification" model;N=74,LV=18 for the α-thalassaemia-β-thalassaemia"two-classification" model;N=22,LV=16 for the standard α-static α "two-classification" model.The three "two-classification" models were validated using the validation set samples that were not involved in the modeling,and the validation RARTotal was 94.5%,94.1%,and 98.2%,respectively.(2)Thalassaemia "four-category discriminant analysis model based on K-correlation coefficient and wavelength optimization:adopting the combination of EC wavelength selection method and K-correlation coefficient method to screen the wavelength model in a wide range.The obtained optimal K-correlation coefficient model parameters K=5,I=846 nm,N=68,G=2.The validation set samples that are not involved in the modeling are used to validate the optimal K-correlation coefficient "four-category" model.The accuracy rates of the normal control,mild β,standard α,and static a validated were 92.4%,80.0%,82.9%,77.1%,respectively.(3)Thalassaemia "four-category" discriminant analysis model based on KNN method and wavelength optimization:Using EC wavelength combination selection method combined with K-correlation coefficient and KNN algorithms,a wide range of wavelength models are screened.The obtained optimal KNN model parameters K=7,I=878 nm,N=88,G=6.The validation set samples that are not involved in the modeling were used to test the optimal KNN "four-category"model.The accuracy rates of the normal control,mild β,standard α,and static α validated were 91.4%,77.1%,80.0%,and 74.3%,respectively.(4)Thalassaemia "four-category" discriminant analysis model based on PLS-DA and wavelength optimization:Based on the three-stage "two-category" discriminant analysis model,the idea of multi-stage and multi-class classification model is adopted to obtain the thalassaemia genotype the "four categories" discriminant analysis model.The four-category discriminant analysis model was validated using the validation set samples that were not involved in the modeling.The accuracy rates of the normal control,mild β,standard α,and static α validated were 93.3%,82.9%,85.7%,and 80.0%,respectively.The results show that the "four-category"discriminant analysis model of thalassaemia based on PLS-DA is better than the "four-category"model of K-correlation coefficient and KNN.The results of the study show that it is feasible to use Vis-NIR spectroscopy to screen thalassaemia genotypes.The method is fast,novel and simple,and has application potential in disease diagnosis and health screening of large populations.The above research provides a new idea for thalassaemia genotype screening.The established model can also provide a reference for the design of special spectroscopic instruments.
Keywords/Search Tags:Vis-NIR spectroscopy, Multi-class spectral discriminant model, Thalassemia genotype screening, Equidistant combination, Wavelength step-by-step phase-out, K-correlation coefficient, K nearest neighbor
PDF Full Text Request
Related items