With the development of industrialization,the production scale of process industries is becoming larger and more complex.At the same time,it is difficult to establish accurate and efficient data-driven soft sensing models to predict variables that are difficult to predict in engineering industry due to factors such as limited sample size,uneven sample distribution,and poor representativeness in the early stages of many novel technology investments.To further develop the application of soft sensing models,generating new samples based on the original sample distribution to expand the sample set is an ideal method to solve this problem.This paper proposes a novel virtual sample generation technique based on semi supervised co-training ideas.The traditional virtual sample generation method has the problem of poor distribution and significant differences in sample quality.To address this,this article uses the idea of co-training to generate and select high-quality samples.Firstly,sparse regions are identified on each dimension of the feature space in the small sample set,and interpolation is performed to generate the input features of virtual samples.At the same time,the virtual output of virtual samples is predicted by a double KNN regressor trained on the original samples,while screening qualified virtual samples and updating the model with these samples to improve the double KNN model’s prediction accuracy of virtual sample output attributes.To verify the effectiveness and superiority of the method proposed in this article,two standard functions and one industrial dataset were used for experiments.The results show that the method improves the quantity and quality of samples,improves the performance of the soft sensing model,and significantly improves the prediction accuracy of the model.Compared with other common virtual sample generation methods,the CTVSG proposed in this article is superior.In addition,for the problem of poor sample coverage caused by the small sample size in most small sample problems,this paper proposes an MTD-CTVSG based on MTD and collaborative training based on CTVSG.Firstly,MTD technology is used to expand the sample distribution space based on the original sample space,and then sparse spaces are filtered in the expanded sample space to generate unlabeled virtual samples;Finally,the collaborative training model is used to obtain the target virtual sample.This article verifies the effectiveness of the method through three-dimensional standard functions and industrial datasets.By comparing several advanced virtual sample generation methods,experimental results have shown that the virtual samples generated by MTD-CTVSG fit the distribution of real samples more closely,and provide more distribution information compared to the original samples,demonstrating significant performance advantages. |