Font Size: a A A

A Study Of Data Mining Based On Data Stream Management System

Posted on:2015-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2298330434464991Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, Some applications produce data which arrives continuously andmassively, thus data stream, a high-speeded, time-varying, unbounded and real-time data,comes into being. The new DSMS can conduct real-time data stream processing andproduce abstract results. Data mining algorithm can process data stream well, has preferableanti-noise ability, response user’s real-time demand and set generic summary data structure.However, at present, whether to process data stream only with DSMS or with datastream mining technology, it is not well developed when it comes to the function, whichdoes not combine advantages of the both in the aspect of processing data stream. Therefore,it is a new challenge in this research field to implement data mining algorithm in DSMSendows DSMS with the function of data mining, which make it easier for the users to use it.This paper introduces the basic approach and experimental result of implanting the datastream mining algorithm in DSMS form the following aspects:(1) Implement the algorithm of stream cluster in Esper. The research has mademodification to the online part of the Clustream algorithm--Micro-Cluster algorithm and toimplement the algorithm with EPL. Thus, Espre can possess the function of conductingcluster analysis and process. Through the test of the data sets of1dimension,5dimensions,10dimensions,100dimensions, the test result proves that that the algorithm in Esper isinefficiency when a few experiment data or low dimensionalities are tested, whereas whenmore data stream and higher dimensionality are tested, the algorithm in Esper obviouslytends to be more superior than the direct employment of the algorithm. For example, whenthe data volume is105and the dimension equals100, the implement efficiency in Esper is6.7%more than direct application.(2) Implement the association rules algorithm of data stream in Esper. The method ofestDec is been adopted to mine frequent item set, then based on sliding window to generateassociation rules. At last, the association rule of data stream comes into being. EPL isemployed to describe and call the algorithm so as to realize the function of processing datastream to get association rules. The experiment has respectively tested one-dimensionnumeric data from103to105, one-dimension text data and one-dimension frequently-useddata. The experiment result proves that when process the data of the same dimension,which is consistent, the association rule resulted from algorithm integrated in Esper is in accordance with that resulted from the direct application of algorithm, which proves thefeasibility of the implement of the algorithm in Esper.
Keywords/Search Tags:Data Stream, DSMS, Esper, Data Stream Mining Algorithm
PDF Full Text Request
Related items