Font Size: a A A

Decremental data mining

Posted on:2004-03-01Degree:M.ScType:Thesis
University:Queen's University at Kingston (Canada)Candidate:He, YanLingFull Text:PDF
GTID:2468390011977237Subject:Computer Science
Abstract/Summary:
Ensemble techniques, such as bagging, are methods to generate predictors from training datasets. By constructing a family of predictors using the subsets of a training dataset and combining those predictors, ensemble techniques can achieve higher accuracy than traditional data mining techniques.; The born-again tree algorithm is another way to improve a predictor's performance. The born-again tree algorithm can significantly increase predictor accuracy by manufacturing data and combining models. However, the cost of generating born-again trees is very high.; Both bagging and born-again trees have their drawbacks. They have no way to remove the influence of old data on final predictors. In the real world, new data is always available in many applications. The effect of this new data should be included when the underlying processes are changing over time. On the other hand, when the underlying processes are changing, some data is outdated and becomes misleading. Thus, we would like to remove the old data's negative influence on predictors. However, data used in data mining is too large to keep around, therefore, it will not be possible to subtract the old data directly from the training data sets. Furthermore, constructing predictors directly from training data sets is very expensive. Decremental data mining techniques are proposed in this thesis to solve this problem.; In this work, a new decremental data mining technique, called sliding-window algorithm, is introduced. The core of this algorithm is to construct models from the relatively new data and then combine these models to build predictors. The sliding-window algorithm can generate predictors with high accuracy and low cost. Experiments we have constructed demonstrate that the sliding-window algorithm is better than the bagging and the born-again trees. The performance comparison is also presented in the thesis.
Keywords/Search Tags:Data, Predictors, Born-again trees, Sliding-window algorithm, Bagging, Training, Techniques
Related items