Decremental data mining

Posted on:2004-03-01

Degree:M.Sc

Type:Thesis

University:Queen's University at Kingston (Canada)

Candidate:He, YanLing

Full Text:PDF

GTID:2468390011977237

Subject:Computer Science

Abstract/Summary:

Ensemble techniques, such as bagging, are methods to generate predictors from training datasets. By constructing a family of predictors using the subsets of a training dataset and combining those predictors, ensemble techniques can achieve higher accuracy than traditional data mining techniques.; The born-again tree algorithm is another way to improve a predictor's performance. The born-again tree algorithm can significantly increase predictor accuracy by manufacturing data and combining models. However, the cost of generating born-again trees is very high.; Both bagging and born-again trees have their drawbacks. They have no way to remove the influence of old data on final predictors. In the real world, new data is always available in many applications. The effect of this new data should be included when the underlying processes are changing over time. On the other hand, when the underlying processes are changing, some data is outdated and becomes misleading. Thus, we would like to remove the old data's negative influence on predictors. However, data used in data mining is too large to keep around, therefore, it will not be possible to subtract the old data directly from the training data sets. Furthermore, constructing predictors directly from training data sets is very expensive. Decremental data mining techniques are proposed in this thesis to solve this problem.; In this work, a new decremental data mining technique, called sliding-window algorithm, is introduced. The core of this algorithm is to construct models from the relatively new data and then combine these models to build predictors. The sliding-window algorithm can generate predictors with high accuracy and low cost. Experiments we have constructed demonstrate that the sliding-window algorithm is better than the bagging and the born-again trees. The performance comparison is also presented in the thesis.

Keywords/Search Tags:

Data, Predictors, Born-again trees, Sliding-window algorithm, Bagging, Training, Techniques

Related items

1	New directions in education research: Using data mining techniques to explore predictors of grade retention
2	Optimal Data Streams Clustering Algorithm Based On N-δ Sliding Window Model
3	Research Of Image Classification Based On Bagging And Tri-Training Algorithm
4	Research On Frequent Patterns Mining Algorithm Based Sliding Window In Data Streams
5	The Processing Strategy For Data Streams Based On Sliding Window In Simulation Platform
6	Research Of Fast Algorithm On Sliding Window
7	Estimating Sliding Window-Based Aggregation Queries Over Probabilistic Data Streams
8	Research On Data Stream Clustering Algorithm Based On Density Grid Over Sliding Window
9	Research On Density Data Stream Clustering Algorithm Based On Sliding Window
10	Research On Uncertain Data Stream Clustering Method Based On Variable Sliding Window