Font Size: a A A

Research On Virtual Sample Generation Technologies And Their Modeling Application

Posted on:2018-10-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:B ZhuFull Text:PDF
GTID:1318330518493667Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data, massive data are available in many areas, but poor knowledge can be obtained. It is necessary to find knowledge through data mining, and therefore data driven modeling has become a research hotspot. The data insufficient sample size, untypical representatives of the sample or uneven sample distribution restricts the quality of data driven modeling. In the context of large data, an important issue that cannot be ignored is the small sample problem of big data. This problem mainly stems from the high cost of data acquisition, or small probability of data duplication resulting in the limited data. How to build effective models based on small size samples is an important research direction in the field of computational intelligence, which is of great theoretical significance and application value.At present, there are two methods to solve the problem of small samples. One is based on the grey theory and machine learning and the other is generating virtual samples. It is an effective way to generate new effective data based on the small sample data, and virtual sample generation is an important research direction of solving the small dataset problem. Based on the summary of the literature, this article will focus on the small sample problem aiming at supervised and unsupervised machine learning algorithm and the corresponding labeled and unlabeled data, carrying out research on virtual sample generation, optimization and application based on the small size sample, to produce sufficient effective datasets. Research on neural network structure and algorithm are conducted and novel approaches of data driven intelligent modeling are put forward. The engineering construction cost risk analysis research and application are carried out. The main contents of this paper are shown as follows:(1) A new virtual sample generation method based on mega-trend-diffusion (MTD): The MTD technique is an effective virtual sample generation technique based on the distribution; however, the current MTD only considers using the same data distribution in the original sample region and the diffusion region to generate the virtual samples, and this method increases the virtual input attributes to multiply the input space. Based on the current drawbacks, a non-uniform distribution in the known small sample area and a uniform distribution method in the extended area were combined to estimate the acceptable range of the small sample attribute through the multi-distributed mega-trend-diffusion. In order not to increase the input attributes, the probability of occurrence of the membership value is not taken as the virtual input attribute; then a novel multi-distributed mega-trend-diffusion (MD-MTD) method is proposed. The validity of the proposed method is verified by a standard function and industrial data sets.(2) A new virtual sample generation method based on optimization technology: in order to solve the optimization problem of virtual samples, on the basis of MD-MTD, this paper proposes a triangular membership information diffusion (TMIE) method, and then presents a new method to determine the upper and lower boundary of the expansion area. Based on improved MD-MTD, virtual samples were generated, and then Particle Swarm Optimization (PSO) is used to optimize the input attributes of the virtual samples for obtaining reasonable virtual samples. As a result, a PSO-MD-MTD method is proposed. The validity of the proposed method is verified by a standard function and industrial data sets.(3) A new virtual sample generation method based on interpolation: the distribution based virtual sample generation technique relies on the model established using small samples; therefore, this paper studies a reasonable and effective neural network model established using small samples, and then generates virtual samples according to the linear and nonlinear structural characteristics of the neural network model. In this paper, we propose an interpolation-based virtual sample generation (IVSG) method based on the interpolation of the hidden layer of the extreme learning machine. The median interpolation of the output data of the hidden layer of the extreme learning machine is carried out to produce virtual samples, and then the virtual samples of the hidden layer are further used to calculate out the virtual data of the output layer space and the input layer space. The validity of the proposed method is verified by a standard function and industrial data sets. The applicability of different methods is analyzed by the comparison of IVSG,PSO-MD-MT and MD-MTD methods.(4) A new modeling method of partial least squares based functional link neural network: On the basis of solving the problem of data sample validity, it is a very important task to use data-driven modeling methods to dig the hidden knowledge behind the data. In order to effectively solve the problem of collinearity data in the functional link neural network and to effectively excavate the knowledge information behind the finite data, this paper proposes a partial least squares regression based functional link neural network (PLSR-FLNN). In PLSR-FLNN, the partial least squares algorithm instead of the error back propagation algorithm is adopted to seek the learning parameters.The validity of the proposed method is verified by two industrial data sets. The comparison of the five modeling methods shows that the proposed method is most advanced.(5) Risk analysis and evaluation of project construction cost based on Monte Carlo expanded samples method: on the basis of solving the problem of small data and modeling in the supervised learning, this paper focus on the data problem in the unsupervised learning. The Monte Carlo method is selected to solve the small sample and the uncertainty problem in the construction cost risk analysis. Based on the Monte Carlo simulation, a sample replenishment method is proposed.Then, the probability distribution and probability density function of the cost items are estimated according to the data samples. Monte Carlo simulation, market factors, and the Likert scale analysis method were combined to comprehensively analyze and evaluate the impact factors.Hence, a practical method of project cost risk analysis is put forward.The validity of the proposed method is verified through an actual project case.
Keywords/Search Tags:Small sample, Virtual sample generation, Neural network modeling, Monte Carlo simulation, Industrial applications
PDF Full Text Request
Related items