| The diagnosis of lumbar disc degeneration is of great significance for the prevention of lumbar disease,and the diagnosis of lumbar disc degeneration mainly relies on the subjective evaluation of the imaging physician,which is likely to misjudge because of insufficient experience.In this paper,based on metabolomic data of intervertebral discs detected by NMR(Nuclear Magnetic Resonance),a computer aided diagnosis method for automatically identifying the grade of lumbar disc degeneration is developed by means of machine learning strategies which provide reference for imaging doctors.First of all,spearman correlation analysis,vector mean similarity based sample segmentation(MSSS)and data standardization methods are discussed in this paper.Several classic machine learning classifiers are introduced in this paper,such as logistic regression,softmax regression,neural network,support vector machine,naive Bayes,k-nearest neighbor,decision tree and so on.Considering the applicability of various machine learning classifiers on lumbar disc metabolomics data,three machine learning algorithms such as softmax regression,neural network,and support vector machine are selected,the realization principle and optimization process are introduced in detail.Next,correlation of NMR metabolic indices of lumbar disc such as T2*value on the lumbar intervertebral disc,fat fraction(FF)of adjacent upper and lower vertebral bodies of degenerative disc with Pfirrmann grade of lumbar disc degeneration is examined separately by Spearman’s correlation analysis,the result shows that three metabolic indices are all significantly correlated with lumbar disc degeneration.Thus,the whole dataset contains 390samples,each of which is described by three metabolic indices.In the preprocessing of the dataset,apply the MSSS-based algorithm to divide the data set into a training set and a test set.In other words,260 of the 390 samples are selected as training samples and the remaining as test samples,and then the training and test sets are normalized separately.The parameters of the corresponding classifiers are trained on the training set based on the optimisation problems of Softmax regression,neural network and SVM machine learning algorithms respectively.The performance of these three LDD diagnostic classifiers is compared on the test set.Later on,it is noted that the imbalance of medical data will affect the performance of the computer-aided diagnosis classifier.In order to reduce the impact of class imbalance on the classifier and improve the diagnostic effect of the grade of lumbar disc degeneration,this paper uses Mahalanobis Distance-based Over-sampling(MDO)algorithm to oversample the minority classes in the training set,this algorithm makes the newly generated few class samples maintain the volatility and correlation of the corresponding class.After oversampling,the number of samples from all classes is equal to the majority sample set,so a class-balanced data set is obtained.Considering that the newly synthesized samples may increase the probability of overlap between different classes,this paper makes some improvements to the MDO algorithm and then uses the improved MDO algorithm to oversample the minority classes of the training set.Finally,the classification results of the LDD diagnostic classifier were compared on the test set in three cases:no class balancing process,class balancing process based on the original MDO algorithm and an improved MDO algorithm.In terms of the five evaluation indicators of accuracy,precision(P),recall(R),F1 and Kappa coefficient,the improved MDO algorithm has better classification performance compared to the classifiers without oversampling and the classifiers produced by the original MDO algorithm. |