| The recent development of Analytical Chemistry, Instrument Science and Information Science spurs the accumulation of chemical data. Such data contains lots of chemical knowledge and information. The data mining in chemical data sets is to discover the hidden relationship between the chemical data and their chemical knowledge. How to efficiently extract the chemical information is a big challenge and opportunity for analytical chemists. Therefore, the aim of this thesis is to develop new methods in mass spectra data mining and evaluation of chromatographic fingerprint of herbal medicine. There are two main parts of this paper: Data mining in Mass spectra database and evaluation of chromatographic fingerprints of herbal medicines.1. Data mining in mass spectra database (chapter 2 to chapter 3): Purpose of this study is to classify and predict the structure of unknown compounds by machine learning base on Mass spectra database. In chapter 2, the influence of the different data modes for data mining of mass spectra had been studied. The origin mass spectral data mode, [0,1] single code data mode, prior logarithm normal distribution data mode, spectral features data mode and peak combination data mode had been classified with K-nearest neighbor (KNN), support vector machine (SVM) and Boosting classifiers, respectively. In general, the origin massspectral data mode and spectral features data mode have the best classification performance. For classification of mass spectra, the more complex the basic structure for one kind compound is, the better classification performance is. In chapter 3, the classification of multi-class mass spectral data had been studied at the first time. A novel sequential procedure assembling AT-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Boosting technique is developed for the classification of mass spectral data. The combined outputs of two sequential classifiers are able to yield the satisfactory classification of mass spectral data, which are better than those of any single classifier. Especially, the average correct rate of classification using the SVM-KNN sequential procedure reaches 80.1%. The proportional procedure for classification of mass spectral data is very encouraging.2. Evaluation of Chromatographic Fingerprint of Herbal Medicine (chapter 4): Modernization of Chinese traditional medicine mostly relies on quality control, and fingerprint technology is a powerful tool to nature product's quality control. Correct evaluation of chromatographic fingerprint of Chinese traditional medicine is a fronting problem for analysts. Robust principal component regression based on principal sensitivity vectors (RPPSV) combined with Monte Carlo cross validation (MCCV) is developed to evaluate quality of chromatographic fingerprints in this study. Compared with correlation coefficient, RPPSV,which has a sound statistic background, seems to have a better ability to detect "outliers". Thus, the common patterns of different Chinese medicines could be better established after deleting the outliers detected by the proposed method. However, the evaluation results upon the fingerprints from both correlation coefficient and RPPSV are consistent with each other on the whole. |