The Research On Data Processing Of Mass Metabolomics

Posted on:2020-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Liu

Full Text:PDF

GTID:2381330578981706

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

Objective:By studying the methods of data preprocessing,pattern recognition and differential variable screening in the analysis of metabolomics data,a process for the analysis of metabolomics data was established.Methods:1.Data preprocessing: Euclidean distance,Mahalanobis distance and clustering analysis algorithm were used to detect outliers respectively,and the common outliers were deleted as the original data.k-nearest neighbor method,continuous k-nearest neighbor method and multiple interpolation method were used to fill the missing values,and normal standardization,range standardization and mapstd standardization methods were used for data preprocessing.2.Pattern recognition: principal component analysis(PCA),partial least squares-discriiminate analysis(PLS-DA),support vector machine(SVM)and artificial neural network(ANN)were used for model evaluation,and the availability of experimental data was comprehensively evaluated.3.Differential variable screening: the volcano map obtained from the p-value and fold change value(FC)was combined with the VIP value of PLS-DA to analyze the comprehensive screening variables,and common variables were found to be the difference variables.Potential biomarkers can be found based on the selected variables.Results:1.Data preprocessingThe outlier was detected by Euclidean distance,Mahalanobis distance and clustering analysis algorithm,and a common outlier sample was deleted for subsequent data processing.The model fitting ability,prediction ability and classification effect were tested by partial least square method.And the k-nearest neighbor method,continuous k-nearest neighbor method and multiple interpolation method had little difference,and the multiple interpolation method was determined by intuitive analysis.The evaluation results of normal standardization,range standardization and mapstd standardization showed that the range standardization model had better fitting ability,prediction ability and classification effect.The test of normality shows that the data without standardized processing do not conform to the normal distribution.After range standardization,the data presents a normal distribution,which can be further analyzed.2.Pattern recognitionNon-machine learning principal component analysis(PCA),partial least squares-discriiminate analysis(PLS-DA),machine learning support vector machine analysis(SVM),artificial neural networkanalysis(ANN)have good fitting ability.Machine learnings do not require the original data to be normal,while non-machine learning requires the original data to be normally distributed.partial least squares-discriiminateanalysis(PLS-DA)is superior to principal component analysis(PCA)in terms of classification effect and prediction effect.3.Variable screening165 different variables were screened out by using the volcanic map analysis with p value and fold change value(FC),268 different variables were screened out by using partial least squares-discriiminate analysis VIP value,and 96 common difference variables were selected out in the two methods as potential biomarkers.Conclusion:The process of metabolomics data processing of "preprocessing-patternrecognition-variablescreening" is established: multiple interpolation-range standardization-partial least squares discriminant analysis-comprehensive variable screening and support vector machine analysis(artificial neural network)-comprehensive variable screening.Non-machine learning algorithms have high requirements for data preprocessing,and have strong abilities in fitting,classification,prediction and finding differential variables of partial least squares-discriiminate analysis.Machine learning algorithms has low requirements for data preprocessing,and has strong ability in classification and prediction.Therefore,in pattern recognition,non-machine learning algorithms and machine learning algorithms can be applied together to verify each other.

Keywords/Search Tags:

Metabolomics, Data Preprocessing, Pattern Recognition, Machine Learning, Variable Screening

PDF Full Text Request

Related items

1	Application Research Of Pattern Recognition Technology With Big Data In Biological And Chemical Industry Typical Cases
2	Study On Some Problems Of NC Machining In Machine Testing Data Preprocessing
3	Wildfire Tactic Pattern Recognition And Solution Generation Method Research
4	Research On Flame Recognition Algorithm Based On Machine Learning
5	Water Quality TOC Indicator Chemical Chip Construction And Pattern Recognition Based On Machine Learning
6	Application Research Of Broad Learning System In Dock Security
7	Aquatic Producs Preservation And Transportation Investigation Based On Metabolomics
8	Research On Virtual Screening Method Of Drug Proteins Based On Imbalanced Data Mining
9	A Kind Of Data Process Preprocessing And Fault Diagnosis Method In Complex Industry
10	Lens Defect Detection And Recognition Based On Machine Learning