Font Size: a A A

Research On The Architecture Of Query System And Algorithms For Finding Patterns From Streaming Data

Posted on:2006-11-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W LiuFull Text:PDF
GTID:1118360182974069Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DBMSs and data minig in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data and propose new algorithm for mining data stream.In this thesis, because service-based approaches have gained considerable attention recently for supporting distributed application development in e-business and e-science, the innovative work to build distributed query processing system over streaming data is firstly presented. This system has been designed as a WSRF-compliant application built on top of standard Web Services technologies. After that, the thesis focus on patterns mining from data stream, more specifically, similarity query processing over data stream and clustering algorithm for streaming data.In summary, the major original contributions of this dissertation are as follows:First, the dissertation focus on architectural aspect of distributed data stream query processing (DDSQP). A generic framework for solving distributed data stream query applications is introduced. The WSRF-enabled architecture of DDSQP systems comprising a collection of distributed Web services is presented. The distributed service architecture increases the portability by isolating platform dependent services to appropriate sites accessible using a well-defined API, facilitates the overall maintenance of the system, and enables light-weight clients, which are easy to install and manage. Moreover, it decouples the clients from the system, allowing users to move, share, and access the services from different locations.Second, a similarity query algorithm for streaming time series using incremental DFT feature extraction and clustering is devised. This approach helps reduce largely the distance computation and the memory overhead. The similarity query approach supports sliding windows model and unlimited landmark windows model. This similarity query approach also support shifting and scaling similarity transform for streaming time series. The experiment showed an improvement of up to 10 times in computation time over the naive KNN approach.Third, a synopsis data structure based mixture probabilistic density data stream clustering approach is proposed, which requires only the newly arrived data, not the entire historical data, to be saved in memory. This approach incrementally updates the density estimate taking only the newly arrived data and the previously estimated density. This method use three distance metric criteria for judging if merging new arriving component into a component of existing Gaussian mixture model or as a new model is added existing Gaussian mixture model. The experimental results have demonstrated that the algorithm is feasible and fulfill high quality clustering results.Fourth, given the popularity of Web news services, I focus our attention on mining hierarchical patterns from Web news stream data. I consider below problem: News articles are retrieved from Web news services, and processed by data mining tools to produce useful higher-level knowledge, and then it is stored in a content description database which is convenient to user query and explorer. To address this problem, a novel algorithm, i.e., FARTMAP (fast ARTMAP) is proposed. FARTMAP devises a new match and activation func-tion which both simple for computation and understanding. The novelty of the proposed algorithm is the ability to identify meaningful news patterns while reducing the amount of computations by maintaining cluster structure incrementally. Experimental results demonstrate that the proposed clustering algorithm produces high-quality patterns discovery while fulfill a reasonable run time.Finally, To address this problem of the results depend greatly upon the value of the selectivity parameter Vigilance coffecient of FARTMAP, that is, too low a value tolerates poor matches and produces few classes with many members;a high selectivity value will create many classes with few members, A Divisive-Agglomerative clustering method to find hierarchical patterns from Web news stream is presented. The novelty of the proposed algorithm is the ability to identify meaningful news topics while reducing the amount of computations by maintaining cluster structure incrementally. The streaming news clustering algorithm also works by leveraging off the nearest neighbors of the incoming streaming news datasets and has ability of identifying the different shapes and different densities of clusters. Experimental results demonstrate that the proposed clustering algorithm produces high-quality topic discovery.
Keywords/Search Tags:WSRF, Data Stream, Nearest Neighbors, Similarity Qyery, Mixture Model, Clustering, Patterns
PDF Full Text Request
Related items