| Objective: To build a set of automated AML analysis procedures with machine learning methods to reproduce the whole process of artificial AML analysis in a flow laboratory,to bridge the gap between cell population identification and patient classification,to achieve the automation of the whole process of AML analysis in the flow laboratory,and to provide practical artificial intelligence means for flow screening and diagnosis experiments.Methods:(1)Analysis using the benchmark data from repository.flowcap.org(Data 1,included 359 AML subjects,by BC flow cytometry),and flow cytometry datawere collected from the AML flow cytometry experiment at the Testing Center of the People’s Hospital and measured(Data 2,78 subjects,by BD Canto II flow cytometry).(2)Preprocessing: Compensation of flow data were conducted in R Computing Environment using built-in compensation matrix,then transformation of the forward scatter light FSC and side scatter light SSC with linear and logarithmic transformation respectively,and the fluorescence intensity of antibody with biexponential transformation were applied,while upper limits of measurement for both BC and BD platforms were considered to keep all variables in two datasets at the same level of magnitude.For Data 2,an automatic gating of doublets was proposed and applied to.(3)Cell clustering: The commonly used normal mixture distribution and the popular Flow SOM method were used to cluster the pre-processed flow data.The mixture distribution model uses ICL criterion to select the optimal subgroup number between 16 and 25,and the number of grids for Flow SOM is 10 or15 in both X-axis and Y-axis directions.Cluster centers called meta cells were collected according to tubes and methods to find out the patterns of their distributions in common spaces.(4)Cell population classification and registration: For Data 1,the meta cells of the first 180 patients were selected as the training set;for Data 2,40 participants’ were randomly selected,and their meta cells were treated as the training set,and the remaining meta cells as the testing set.In this study,we found that the distribution characteristics of meta cells in FSC,SSC and CD45 Spaces were similar to those flow cytometry data of real healthy people or patients.The training set was used to train the meta cell classification model with 20 categories,and build a map to map 20 classes to 10 known cell groups.The classification models were used to classify any new meta cells,and the map to label these meta cells.(5)Feature extraction and Diagnosis of AML: Based on the results of subpopulation labeling,the composition ratio of each cell subpopulation,positive expression rate of each cell surface or intracellular antigens,mean and median expression intensity of antigens,etc.were extracted selectively.According to the consensus of Chinese experts on flow cytometry for acute leukemia in 2021,together with the testing on two training data,the diagnosis discriminant criteria were constructed,and the feature data of patients automatically collected were used to diagnose patients.Results:(1)The examination of the process files,especially the review of the effects,revealed that the preprocessing part,the cell clustering part,and the cell population classification and registration part of the two data were all free of abnormalities and highly reliable.(2)Automatic diagnosis results of Data 1:When using the normal mixture model to cluster cells,the sensitivity was 100%,specificity was 99%,accuracy was 99%,Youden’ s index was 0.99,F-measure was 0.99,Kappa value was 0.97,positive predictive rate was 95%,and negative predictive rate was 100%.When using Flow SOM to cluster cells,the sensitivity was 100%,specificity was 98%,accuracy was 98%,Youden’ s index was 0.98,F-measure was 0.98,Kappa value was 0.92,positive predictive rate was 87%,and negative predictive rate was 100%.(3)Automatic diagnosis results of Data 2:When using the normal mixture model to cluster cells,the sensitivity was 89%,specificity was 94%,accuracy was 93%,Youden’ s index was 0.83,F-measure was 0.93,Kappa value was 0.84,positive predictive rate was 91%,and negative predictive rate was 94%.When using Flow SOM to cluster cells,the sensitivity was 88%,specificity was 94%,accuracy was 92%,Youden’ s index was 0.82,F-measure was0.92,Kappa value was 0.83,positive predictive rate was 91%,and negative predictive rate was 94%.(4)The series system of the two data:When using the normal mixture model to cluster cells,the sensitivity was 89% and the specificity was99%;when using Flow SOM to cluster cells,the sensitivity was 88% and the specificity was 99%.Conclusion: The automated AML flow cytometric data analysis method(Arc Dia)was successfully designed;Arc Dia has good authenticity and high reliability in flow diagnosis of AML,and has some clinical guidance for the automated application of flow diagnosis. |