Multivariate Data Analysis And Feature Variables Selection | Posted on:2023-11-21 | Degree:Master | Type:Thesis | Country:China | Candidate:R N Dong | Full Text:PDF | GTID:2530307115488244 | Subject:Engineering | Abstract/Summary: | PDF Full Text Request | The high dimensional redundant data brings great challenges to current data analysis.How to eliminate irrelevant variables from various data and extract useful information to the maximum extent is an urgent problem for analytical chemists today.The paper firstly summarizes the proposals,characteristics,developments and comparisons for selecting variables in spectral data such as near-infrared spectroscopy,and their application progress in different fields in recent years.Among them,the selection of parameters to evaluate the importance of variables and their standards or thresholds,and the strategies and approaches to search variables are the keys to variable selection.In addition,the problems of over-fitting and instability of variable selection methods in actual complex systems and corresponding solutions are discussed.Meanwhile,the research trend,development prospect and application direction of variable selection methods are prospected.Here,a study of variable selection was conducted for the following actual systems:It is of great significance to select significant genes which have biological relevance to the occurrence,progression and classification of cancer(malignant tumor).A consensus method for cancer recognition and classification based on wavelet transform(WT)and uncorrelated linear discriminant analysis(ULDA)is proposed by using gene expression data(endometrial cancer and lung cancer data)coupling with chemometrics methods.The results showed that the consensus WTULDA method could achieve accurate identification/classification of cancer and noncancer groups,and simultaneously select informative genes from a large number of genes as potential cancer biomarkers.Principal component analysis(PCA)was used to further reveal the relationship between retained genes.The function of the selected informative genes which have clinical values and application prospects were analyzed to provide the theoretical support for the identification and diagnosis of tumors.By combining the variable selection method with the infrared spectrum,mass spectrum and chromatographic measurement data of red wine,the characteristic variables were retained,the interference of irrelevant variables were reduced.Furthermore the classification and identification of the source regions of red wine raw materials were carried out.On this basis,the red wine quality parameters were predicted by using the infrared spectrum data of red wine.The results showed that the least squares support vector regression(LSSVR)can achieve the same or better prediction results than the full spectrum model when fewer variables are reserved,and realize the rapid,simultaneous analysis of multiple wine quality parameters.Therefore,variable selection combined with LSSVR quantitative regression model can be applied to actual wine quality analysis. | Keywords/Search Tags: | chemometrics, variable selection, consensus strategy, regression model, quantitative analysis | PDF Full Text Request | Related items |
| |
|