Font Size: a A A

A Research On Algorithms Of Mining Changes Over Data Stream

Posted on:2007-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2178360182479288Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The research on mining changes over data stream is one of the key issues in the domain ofdata stream mining.Part I: Some methods of detecting changes of pattern and classification over data steamhave been explored in the past several years. However, these methods were used on relativelyhigher conceptional hiberarchy, and their outcomes as well. A lower conceptional hiberarchymining method was proposed in this part: Firstly, the Reservoir Sampling method was used toconstruct a current sliding window over continuous data stream. Secondly, some interrelatedattributes of tuples that included in the current window were selected base on entropy. The tuplecontained in a referenced window was appointed by users or experts. Thirdly, the metricdissimilarity was calculated base on the City Block Distance between the tuples in the currentwindow and the appointed tuple in the referenced window and changes were described bycalculated outcomes. Finally, a multi-interval method was used to track the trend of changes.The method is sensitive and real-time, both the description of outcomes and the trend of thesechanges can be understood easily.Part II: A NBCC algorithm was designed and analyzed for mining changes over datastream. First, the synopsis data structure of data stream was constructed by the ConciseSampling method. Second, the trained samples set of data stream was divided into Ci classes byusing the reference of classical naive Bayesian classification method. Third, a threshold value αwas selected for the tested samples set of data stream, and if P(X|Ci)* P(Ci)< α , i=1,2,…,m,i.e., if the probability of the tested sample X belongs to any of the known Ci classes is less thanα, a conclusion can be gained that one change was detected in data stream, and the change wassigned as a new class Cm+1. Then the method was used circularly.Part III: Emphasis of the research is that changes are mined over data stream based on boththe support of frequent itemsets and the novelty of association rules. Contributions of this partincludes:(1) Synopsis data structure is constructed by the reservoir sampling technology overdata stream, and the premise is that the unit of the data stream is tuple;(2) Changes aremeasured and mined over data stream by the method that both the support values of frequentitemsets and the novelty values of association rules are computed and compared betweencurrent window and referenced window. The analysis shows that the method of mining changesover data stream continuously is reasonable and executable.Part Ⅳ: DML was applied on the mining of data stream. Primary research work that aimsat solving the question of mining changes based on DML over data stream was finished.
Keywords/Search Tags:Data stream mining, Algorithm, Synopsis data structure, Change, Entropy, Sampling, DML
PDF Full Text Request
Related items