Font Size: a A A

Research On Classification Algorithms Of Data Stream

Posted on:2008-09-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:P WangFull Text:PDF
GTID:1118360215484461Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Now we live in an informative era when communication, computer and networktechnology has changed the lives of people. These technologies enable people toaccess a large volume of data. The massive data bring people's lives great advantagewhile incur one important problem: how to utilize these data and find usefulinformation from it?Data mining is to extract useful information from massive volumes of data, that is,to reduce, model, understand, or analyze the data. After development of more thanone decade, data mining becomes an extensive subject. Tasks supported by datamining include classification, association rules, cluster, and summarization.Due to the evolution of network, telecommunication and sensor technologies, anew data processing model, known as data stream processing, has recently arisen. Inthis scenario, continuous and long-running queries are posed over transient streamingdata. Example applications include financial tickers, network traffic monitoring, weband transaction log analysis, and sensor networks. A data stream is a continuous,time-varying, unbounded sequence of data-items, implying that online streamalgorithms are restricted to only one pass over the data.This paper exploits the classification problem on data stream from two aspects. First,for data stream with fast arrival of new data and concept drift, the classificationalgorithms based on frequent pattern and association rules are proposed; Second, theload shedding problem is exploited when classifying multiple data streams, one loadshedding scheme is proposed to address this issue.The main contributions of this dissertation are just as follows:1. Propose a classification algorithm based on frequent pattern for data stream. Itutilizes frequent patterns to represent the potential concept in stream data. Itrepresents concept drifts by dynamically inserting or deleting frequent patterns, andadjusting the support and confidence of existing frequent patterns. Also, it uses decayfactor to maintain the uptodate of frequent patterns.2. To promote the efficiency of pattern-based algorithm, an algorithm based onassociation rules is proposed. It first analyzes the shortcoming of existing streamclassification algorithms. The efficient data structure is used to maintain the set ofrecords and rules. Moreover, it utilizes one heuristic way to learn new rules frommisclassified records. 3. The rule-based algorithm is extended to the stream applications with bias classes.Two algorithms are proposed and both of them can work well in this scenario.4. It analyzes the load shedding problem when classifying multiple streams, where theserver cannot receive all of data from the streams because of limited bandwidth. Oneefficient load shedding algorithm is proposed, which uses the techniques of datatransform, multi-step data acquisition scheme and negative knowledge. With thisalgorithm, the server can only inspect a small portion of data while still maintaininghigh accuracy.
Keywords/Search Tags:data mining, data stream, classification, frequent pattern, association rule, load shedding
PDF Full Text Request
Related items