In the 21 century,the boomed data of chemistry and biology,quickly developed equipments and analysis technologies,help us obtain more information about structures and functions.How to obtain valuable knowledge is a large challenge to life science research.In order to resolve this problem,we should improve algorithms or propose new algorithms.Curse of dimensionality is one of the most difficult problems in large scale data analysis.New methods and solutions are proposed.Variable selection and variable transformation are used to resolve this problem.This main study of dissertation is the study of new variable selection and variable transformation methods.First,research background,the concepts and achievements are briefly introduced. A brief description of the QSAR principle,realization process and research status are given.Dimensional reduction methods of huge data are introduced,including variable selection and variable transformation.Kernel method is described in detailsThen,methods of variable selection and variable transformation are proposed, including kernel method,statistical moment's transformation method and pattern variables method.Kernel functions are used successfully in machine learning etc.In previous studies,different variable selection methods obtain different results.In order to avoid this condition,Kernel partial least squares is used in this study.The relationships of original variables are replaced by the relationships of samples.Satisfied results are obtained.Statistical moments are used to transform variables.The data are divided into several intervals.The statistical moments of each interval are used as new variables. The number of variables is decreased.The classification results are improved.The above two methods use full and local information of the data,though the contributions of variables are not considered.Then the method of pattern variables is proposed.In this method,continues variabls are transformed into pattern variables. The number of variables is further decreased.The specific patterns of cancer and normal are extracted respectively.These methods are applied in some real case.In diagnosis of ovarian cancer and leukemia,good results are obtained.The retention times of peptide are predicted by three variables(sum of retention time of amino acids,Van der Waals volum and n-octanol-water partition coefficient). The results of KPLS are superior to those of linear method.KPLS is used to predict the retentiontime of dioxins.Two kinds of molecular modeling methods are used to predict the behavior of dioxins.KPLS are super than PLS in both modeling and predicting.QSAR models based on the results of molecular docking are constructed.The distances of inhibitor and active sites of NA are apllied as variables in QSAR. |