Artificial olfaction system, also called electronic nose (Enose), a new type of bionic testing instrument mimicking the function of human olfaction system which is rapid, stable, inexpensive and nondestructive. As the rapid development of computing science, methods based on statistics and machine learning have been used for data mining of relevant information in signals of artificial olfaction system.Classification methods is one of major types of datamining techniques for artificial olfaction system. Regular classification methods only provide prediction result without measure of the reliability of prediction result, i.e. the measure on how we trust the prediction and guarantee of validity of the measure. Probabilistic prediction methods, such as Naive Bayes and Logistic Regression, can provide the probability that the prediction is correct. However these methods are excessively dependent on the assumption of example distribution, once the assumed model is incorrect, the predicted probability is incorrect and validity of reliability measure is not guaranteed. Classification rate (for classification case) and precision (for regression case) are assessment criteria of reliability of systemâ€™s prediction results. Because of the interference of multifactors, for example, drift of sensors, the prediction performance of the model gradually degraded after it is built. Therefore the validity of reliability measure based on this method is not guaranteed.Conformal prediction and Venn machine are recently developed machine learning algorithm to provide reliability measure for prediction result. Both methods are flexible that any classification methods can be used as their underlying algorithm with appropriate modification. They provide reliability measure both for individual prediction and overall predictions. Once the examples satisfied simple independent and identicaldistribution (I.I.D) assumption. The validity of reliability measure is guaranteed in theory.Ginseng has great value in traditional Chinese medicine. The increasing demand and consumption of ginsengs has led to some species substitution or adulteration with other species because of their different market prices. Traditional identification of ginsengs is implemented by sensory analysis with a panel of experts, which is a costly process and the validation of identification depends only on the different levels of experience. Essential oil is a kind of volatile aromatic essence extracted from natural plant, which is of great variety and widely used in daily life. Analytical chemistry based measuring techniques are sophisticated and expensive. Lung cancer has been the main cause of cancer death, the death number is increasing year by year. Biopsy is the most authoritative method for lung cancer diagnosis. However, this techniques is very harmful and cannot be repeatedly used in short period. Researches indicated that early screening could significantly decrease the death rate of lung cancer. The existing diagnosis techniques are sophisticated, expensive and cannot be applied widely. Therefore, development of rapid, stable, inexpensive and nondestructive measurement techniques is urgent for ginseng discrimination, ginsengoil discrimination and early lung cancer diagnosis.In this paper, ginseng discrimination, ginsengoil discrimination and early lung cancer diagnosis were studied as instances to investigate the methods for the reliability measure of prediction result of artificial olfaction system. A homemade artificial olfaction system was used and conformal prediction and Venn machine was firstly introduced for the reliable prediction of ginseng samples, essentialoil samples and cancerdiagnosis samples in both offline and online mode. This study is meaning for reliability measure of prediction for sophisticated samples using artificial olfaction system. In addition, to improve the prediction performance for ginseng samples, a hybrid system consisting of artificial olfaction system and nearinfrared spectrum system were used and a featurelevel, a decisionlevel datafusion method were proposed for data mining. The main content and conclusions of this paper were as follows:A homemade artificial olfaction system, of which the core element is a metaloxide semiconductor gassensor array, was designed including software and hardware. Automatic control on sample measurement process was accomplished. Ginseng samples, essentialoil samples and lungcancerdiagnosis samples were prepared and measured.To investigate the reliability measure of prediction of artificial olfaction system, conformal prediction was firstly introduced. Three conformal predictors (CP1NN, CP3NN, and SVM) based on knearest neighborhood (KNN) and support vector machine (SVM) were constructed and applied for the prediction of ginseng and essentialoil samples in both offline and online mode. The performance of three conformal predictors were also compared with that of simple classifier INN,3NN and SVM. In offline mode, when conformal predictors were forced to output single predict value, the optimal classification rate for ginseng and essentialoil sample were 85.71% (CP1NN) and 96.17% (CP1NN) separately. In addition, conformal predictor provided reliability measure for every prediction. In online mode, the rate of error region predictions by conformal predictors never exceed preset significance level (defined by the user), which indicated the validity of reliability measure for the overall predictions. The precision of predictions by three conformal predictors under different preset significance levels and the characteristics of empty, single, multiple prediction were analyzed and discussed.To investigate the reliability measure of prediction of artificial olfaction system, Venn machine was firstly introduced. Three Venn predictors (VMNB, VMSR and VMSVM) based on three regular probabilistic prediction methods, Naive Bayes, Softmax Regression and Plattâ€™s Method, was constructed applied for multiprobabilistic prediction of ginseng and lungcancerdiagnosis samples. The performance of three Venn predictors were also compared with that of three regular probabilistic prediction methods in the aspect of classification rate and validity of probabilistic prediction. In offline mode, Venn predictors achieved optimal classification rate of 86.38% (VMSVM) and 97.22% (VMSR) for ginseng and lungcancerdiagnosis samples separately. In addition, Venn predictors provided narrow probability interval for every prediction.. The assessment criteria of validity of prediction performance by Venn predictors were better that of corresponding regular probabilistic prediction methods, and the probability intervals estimated by Venn predictors were consistent with probability that the predictions were right, which indicated the validity of probabilistic prediction of Venn predictors. The width of probability interval by Venn predictor was very narrow and close to single probability value by regular probabilistic prediction methods. In online mode, Venn predictors still output valid probability interval. The distribution of probability interval moved upward gradually and the width of probability interval decreased gradually as the number of samples in training set increased.A hybrid system consisting of artificial olfaction system and nearinfrared spectrum system was used for the discrimination of ginseng samples. A weighted featurelevel datafusion method was proposed to solve the problem caused by the imbalance number of features from two systems and classification rate of 99.58% was achieved. Probabilistic prediction and DempsterShafer evidence theory were combined to fuse the data from two systems in decision level and classification rate of 99.24% was achieved. The classification rates with two datafusion methods were better that of any single system (90.18% by artificial olfaction system,97.98% by nearinfrared spectrum system) and the differences were statistically significant.
