Research On Classification Algorithms Of Data Stream

Posted on:2008-09-13

Degree:Doctor

Type:Dissertation

Country:China

Candidate:P Wang

Full Text:PDF

GTID:1118360215484461

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Now we live in an informative era when communication, computer and networktechnology has changed the lives of people. These technologies enable people toaccess a large volume of data. The massive data bring people's lives great advantagewhile incur one important problem: how to utilize these data and find usefulinformation from it?Data mining is to extract useful information from massive volumes of data, that is,to reduce, model, understand, or analyze the data. After development of more thanone decade, data mining becomes an extensive subject. Tasks supported by datamining include classification, association rules, cluster, and summarization.Due to the evolution of network, telecommunication and sensor technologies, anew data processing model, known as data stream processing, has recently arisen. Inthis scenario, continuous and long-running queries are posed over transient streamingdata. Example applications include financial tickers, network traffic monitoring, weband transaction log analysis, and sensor networks. A data stream is a continuous,time-varying, unbounded sequence of data-items, implying that online streamalgorithms are restricted to only one pass over the data.This paper exploits the classification problem on data stream from two aspects. First,for data stream with fast arrival of new data and concept drift, the classificationalgorithms based on frequent pattern and association rules are proposed; Second, theload shedding problem is exploited when classifying multiple data streams, one loadshedding scheme is proposed to address this issue.The main contributions of this dissertation are just as follows:1. Propose a classification algorithm based on frequent pattern for data stream. Itutilizes frequent patterns to represent the potential concept in stream data. Itrepresents concept drifts by dynamically inserting or deleting frequent patterns, andadjusting the support and confidence of existing frequent patterns. Also, it uses decayfactor to maintain the uptodate of frequent patterns.2. To promote the efficiency of pattern-based algorithm, an algorithm based onassociation rules is proposed. It first analyzes the shortcoming of existing streamclassification algorithms. The efficient data structure is used to maintain the set ofrecords and rules. Moreover, it utilizes one heuristic way to learn new rules frommisclassified records. 3. The rule-based algorithm is extended to the stream applications with bias classes.Two algorithms are proposed and both of them can work well in this scenario.4. It analyzes the load shedding problem when classifying multiple streams, where theserver cannot receive all of data from the streams because of limited bandwidth. Oneefficient load shedding algorithm is proposed, which uses the techniques of datatransform, multi-step data acquisition scheme and negative knowledge. With thisalgorithm, the server can only inspect a small portion of data while still maintaininghigh accuracy.

Keywords/Search Tags:

data mining, data stream, classification, frequent pattern, association rule, load shedding

PDF Full Text Request

Related items

1	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
2	Algorithms For Data Stream Mining
3	The Research On The Related Problems Of Association Rule Mining
4	Study On Bit Stream Oriented Unknown Frame Head Identification
5	Research On The Mining Algorithms Of Association Rule
6	The Research And Implementation Of Association Rule Data Mining Algorithm
7	Research On Frequent Pattern Mining Methods For Large-scale Date Stream
8	Research Of Auto-adapted Load Shedding Algorithm On Data Stream Inquires Continuously
9	Research And Application Of Association Rule Mining Algorithm
10	Research And Implementation On Large Scale Network Data Stream Anomaly Detection System