Font Size: a A A

Research On Key Technologies Of Stream Data Mining

Posted on:2011-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:P NiFull Text:PDF
GTID:1118360308462215Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Stream data mining is a technology to discover patterns from irregular sequences. There exist many essential discrepancies with statistic data mining technologies because of special stream data features, for example, high speeds and continuous as well as unbounded radiuses in stream data. There exists series of problems on data stream mining, for example how to discern effective patterns in one mining period? How to exhibit patterns for faciliting users to understand patterns easily? How to choose data structure so that the patterns can be stored, queried and removed efficiently? How to choose suitable mining periods? How to process noises in quantities of data and what's the optimal mining periods for unbounded stream?At last, we focus on the stream indicators research that means how to aggregate multi-indicators to few indicators in order to make service controllable and measurable easily.Focusing on above mentioned problems in stream data mining area, the principal contributions presented in this thesis are:(1) Focusing on how to discern effective patterns in one mining period, we present a novel method called EARA(Events Association Rules Analysis) to mine the correlations of events in large scale networks to extract correlation patterns. Anomalous events can be discovered in a large sensor network where its structure is unknown through EARA. EARA enables users to select the correlation confidence level and only display the significant event correlations. Simultaneouly, an algorithm called VPC(Visual Pattern Compress) is presented to make extracted patterns understood easily by observers. Our experiment results show that EARA can discover significant event correlations relations in both continuous and discrete signals from large scale networks. VPC algorithm can further compress patterns for discovering valuable patterns from thousands of extracted patterns.(2) Focusing on how to choose data structure so that the patterns can be stored, queried and removed efficiently, we present a algorithm called IKMM(Incremental Knowledge Mining Model) on incremental mining in sensor networks in which the lexical tree structure is employed in this thesis. The tree structure updated is controlled through time parameters in slide window technology. What's more, heuristic rules are developed to improve efficiency of extracting association rule. IKMM can outperform FUP2 and AFPIM algorithm about 10 times in efficiency. (3) Currently many researchers focus on making sure running time points of association rule algorithms in stream. This can improve system resource utilization and reduce the cost of running data mining algorithm. In this thesis, we present an algorithm to make sure the running time point of association rules algorithm in stream data called KRPB(Key Runtime Point Boundary). KRPB need to scan original data set only once and depend on the incremental discrepancies in specific ranges around support parameters to estimate the differences of two data sets. At last, it gives a hint if the data mining algorithm should be executed in exact sample period.(4) Visualization technologies in stream data mining are discussed. That means how to reduce the multi dimension to lower dimension for visualizing relations of different stream data. In this thesis, we present a visualization algorithm called IMDS (Incremental Multi-dimension Scaling) to discover patterns from quantities of data. IMDS algorithm clusters data through the shape of each structure data instead of traditional cluster algorithm which needs global information for precision category result. What's more, IMDS algorithm can be implemented through animation because it maps multi dimension to low dimension. Through experiments we prove IMDS algorithm can greatly outperform MDS (Multi-dimension Scaling) and simplex algorithm in efficiency and effectiveness through experiments.(5) Focusing on how to aggregate multi-indicators to few indicators in order to make service controllable and measurable easily, a algorithm called SLAEP(SLA Extract Patterns) for extracting SLA criterion is proposed. SLAEP extracts patterns from large data sets of customers'experience through machine learning according to key stream performance indicators and key stream quality indicators predefined. By learning, Telecommunication companys can control the allocation of resources for meeting customers from low level to high level. SLAEP maps multi attributes to multi dimension spaces so it needs not consider the associations of multi-variants. What's more, we can discern the accuracy of extracted patterns through visualization technologies in order to adjust user-specific input parameters.
Keywords/Search Tags:Closed Confidence, Sensor Network, Stream Data, Association Rule, Visualization, Dimension Reduction, Machine Learning, KPI, KQI, Operation Data Analysis, Network Management, SLA
PDF Full Text Request
Related items