Lung cancer(LC)is one of the malignant tumors threatening human life.However,the existing diagnostic methods could hardly meet the requirements of high reliability,easy-handling,low-cost,speediness and noninvasiveness.In recent years,the electronic nose(E-nose)technology which diagnoses LC by detecting volatile organic compounds(VOCs)in breath has gradually become a promising method.To this end,our project team has done a lot of research and developed the E-nose prototype to diagnose LC via breath.In this theis,the problems about effective feature extraction,the inconsistency of the test results caused by the discretization of component parameters in different prototypes,and the detection deviation caused by the sensor drift in one prototype are studied.The main work and achievements are as follows.(1)Aiming at the performance improvement of single E-nose prototype for lung cancer diagnosis,based on the application requirements of E-nose in LC diagnosis via breath,a multi-type sensor array is constructed,then the methods for feature extraction and selection of sensor signal are studied.The results show that the E-nose system for lung cancer diagnosis constructed in this project could effectively identify lung cancer patients through exhalation.The features extracted from the exponential moving average(EMA)curve of the original sensor response could improve the performance of the diagnostic model.Sparse group Lasso(SGL)feature selection algorithm further improves the performance evaluation of the diagnostic models.The analysis of SGL optimization results shows the rationality of the sensor array to some extent.The effectiveness of multi-type sensor combination has also been proved.(2)To deal with the data shift problem caused by instrument variation,assisted by a small number of transfer samples from target domain,a data correction algorithm named sparse unidirectional domain adaptation(SUDA),is presented.Different from previous data correction methods,the SUDA algorithm consists of two steps.The first step mainly improves the problem that the data separability in a single domain is not high enough caused by a variety of reasons.The second step focuses on unidirectional domain adaptation(UDA),which improves the existing unidirectional domain adaptation method and carries out explicit data discrepency correction,including mean distribution discrepancy and conditional mean distribution discrepancy correction.Experiments on the data from two LC diagnosis E-noses and one public E-nose dataset show the effectiveness of the proposed algorithm in correcting instrument variation.(3)In order to solve the data shift problem caused by gas sensor drift,under the assumption that there are no transfer samples in the target domain,based on the network structure of broad learning system(BLS),a novel broad network model called domain transfer broad learning system(DTBLS),which could adapt to the change of marginal and conditional distribution simultaneously,is proposed.Compared with SUDA algorithm,DTBLS combines data correction and classifier training into one step,simplifying the model implementation.By imposing the corresponding constraints to the weight matrix between the BLS feature of original data and the output layer,DTBLS inherits the fast computing ability of BLS network and possesses the ability of cross-domain discrimination.Experiments on LC diagnosis E-nose dataset and long-term gas sensor drift dataset from University of California at Irvine show that DTBLS has great advantages in the sensor drift suppression.(4)Time drift and instrument variation often exists simultaneously between different prototypes.To solve the problem of mixed data shift in multi-physical domains,a mixed data shift correction algorithm based on serial geodesic flow kernel(SGFK),which is applicable to the scenario of multi-source domain and multi-target domain,is proposed.SGFK algorithm is an unsupervised multi-domain adaptation method with a wide range of application.The algorithm uses the low-dimensional structure of the data to track the path of the data distribution change piecewise,and then obtains the multi-domain invariant super-feature-vector.On this basis,a kernel function with multi-domain transferring ability is constructed to achieve the purpose of multi-domain adaptation.In addition,the finegrained reference domain refines the process of data distribution change between the two domains with greater differences,which improves the multi-domain adaptation ability of SGFK.Experiments on two LC E-nose dataset show that SGFK could effectively correct the mixed data shift,and finegrained reference domains could also significantly improve the correction ability of SGFK. |