Font Size: a A A

A Study On Missing Imputation Based On Cost-sensitive

Posted on:2011-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:K M LiuFull Text:PDF
GTID:2178360305977846Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In data mining and machine learning, pattern recognition and information retrieval, data analysis, theory applications, data everywhere there is a lack of inevitable and challenging problem. Theoretical development and practical application because the needs of many domestic and foreign scholars have studied more in depth the causes of missing data, type, and fill algorithm. Various algorithms used for filling missing data have been proposed, the typical algorithm has:EM algorithm, single-filling method, multiple filling algorithm and C4.5 algorithm. However, these methods of handling missing data are independent of specific applications, that they do not consider the specific application areas and independent filling algorithm. However, the latest study concluded that these do not depend on the specific application of the filling algorithm for some machine learning tasks (such as the cost-sensitive learning) is not applicable. In the given data set containing missing data train a cost-sensitive decision tree, the total cost of constraint due to the reasons for which some properties do not have to fill missing values. So we need to make up for cost-sensitive learning with missing data fill the gap left between the issues. In the field of data mining and machine learning, cost sensitive issues research has been a hot issue, many domestic and foreign researchers or research institutions in the price-sensitive issues have done a lot of research and made many new theories and methods. This is the greatest concern is that two types of costs:the test costs and misclassification costs. Abroad, the first loss for the data processing for Research in the United States, which studies the starting point is to correct the errors in the U.S. social security data. In the meantime use the scholars put forward recently with the missing data values to fill the vacancies like methods, such as k nearest neighbor classification, rough set theory, Bayesian networks, neural networks (NN) so. In China, the problem of filling missing values still in the initial stage, although a number of academic conferences and journals in literature can also find some of the missing values of the theoretical study, but lack of data processing of research results are not directly related to common. In addition, missing value based on cost-sensitive imputation is still rare both domestic and abroad.The previous research work of this paper provide a solid basis of theoretical methods. The paper on top of this price-sensitive to the existing CII algorithm is improved and is ready to fill missing values on the price sensitivity of the following questions for discussion and research:(1) of the current account of the costs of missing value sensitive issues advantages and disadvantages filling algorithm, the problems for the algorithm to improve the algorithm to achieve strategic thinking to improve the algorithm performance as a test platform; (2) Missing value for the cost-sensitive properties of the choice of filling a useful theory that propose an effective method of case discovery absent, thus effectively reducing the system cost and improve the precision of the system fill.
Keywords/Search Tags:cost-sensitive learning, missing value filling, absents case
PDF Full Text Request
Related items