Font Size: a A A

The Research For Method Of Missing Data Interpolation Based On GMDH

Posted on:2008-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2178360242957851Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology and the continuous improvement of people's capacity to collect data, the wider use of database, Data Warehouse and internet technologies, People accumulate more and more data.Data mining technology Came into being and go on development alone with data.However, the majority of data mining algorithms are based on the ideal data set, but in reality, Due to various reasons, the collected data is often incomplete, and there is more or less missing data, In this case, the usual methods for handling missing data is to estimate missing data, based on estimates, We conducted data mining.Now the most widely used method of missing data interpolation is regression interpolation,neural network interpolation, K-nearest interpolation.But when processing noise data, these methods exists certainly insufficient, for instance, under the noise data, regression interpolation and neural network interpolation are vulnerable to over fitting to noise interference. When K is very small, K nearest interpolation is vulnerable to noise interference.GMDH method is a good way to deal with small samples and noise data.Based on the theory of missing data, this paper introduced the GMDH method oriented noise data, and established the missing data interpolation method on system noise data. According to different model of missing data, assuming a different mechanism of missing data, this paper combined different algorithm with the GMDH algorithm to estimate missing values. In a single-variables missing model and MAR missing mechanism, this paper combined GMDH algorithm with the EM algorithm, according to the the relationship between the variables, established GMDH models to estimated the missing data.In the multi - variable model,and ignored the missing data mechanism, this paper combined GMDH algorithm with the K-nearest algorithm, according to the the relationship between the samples, established GMDH models between the samples to estimate missing data according to the similar models.Therefore, the main task of this article is:1. At first, the data loss model is single - variable missing data, the data loss mechanism is MAR loss:(1) This paper presents the new methods based GMDH and EM, gives the basic assumption of this new methods to establish missing data, designs the basic steps of interpolation algorithm, and write the corresponding procedures.(2) Through a theoretical analysis, numerical study and the Experimental of the Chinese economy, this paper compare the interpolation method based on GMDH missing data and the interpolation method based on regression., and show the effectiveness and superiority to the estimates of the missing values in the interpolation algorithm-based GMDH in the noise data through a comparison.2. Secondly, the data loss model is multi - variable missing data model, the data loss mechanism can be neglected.(1) This paper presents the new methods based GMDH and K-nearest algorithm, gives the basic assumption of this new methods to establish missing data, designs the basic steps of interpolation algorithm and write the corresponding procedures.(2) Through a theoretical analysis, and the Experimental of the Chinese economy,this paper compare the interpolation method based on GMDH missing data and the interpolation method based on regression.and show that the effectiveness and superiority to the estimates of the missing values in the interpolation algorithm-based GMDH in the noise data through a comparison. According to the interpolation process of missing data, the paper points to the main innovation in the following areas:1. In the process of missing data interpolation, this paper study the missing data interpolation under the noise data(1) When the data loss model is single - variable missing data, the data loss mechanism is MAR loss, We combined GMDH algorithm with the EM algorithm to estimate missing values, though iterative algorithm, reduce the noise impact on the estimated data of the missing data, and through adding restrictions in the actual conditions, therefore accelerated the iterative pace and overcome the shortcomings of not building modle in the circumstances of more missing data,only relatively few observations.(2) When the data loss model is multi - variable missing data model, the data loss mechanism can be neglected, We combined GMDH algorithm with the K-nearest algorithm to eliminate missing data, reduce the noise impact on the estimated value of missing data, and the importance of the K value in the interpolation process, and improve accuracy of estimates through the internal and external criteria of GMDH algorithm.2. In the process of missing data interpolation, we combined the models and mechanisms of missing data with the interpolation method of missing data, and provide a theoretical basis to choose different interpolation algorithm to estimate the missing values under different missing data models and mechanisms.
Keywords/Search Tags:GMDH algorithm, EM algorithm, K-nearest algorithm, Missing data
PDF Full Text Request
Related items