Font Size: a A A

Research On High-dimensional Data Stream Clustering Algorithm And Its Applications Based On Information Entropy

Posted on:2016-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:T T YangFull Text:PDF
GTID:2308330461964110Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, many scho lars have done a lot of research works for high-dimensional data stream processing technology, but problems, such as low efficiency and large data storage etc., still exist in reality. The paper, on the basis of analyzing the characteristics of high-dimensional data stream comprehensively, conducts further research on dimension reduction of high-dimensional data stream, clustering algorithm and trend analysis method. The main research works and achievements are shown as follows:1) The researches about the characteristics and applications of high-dimensional data stream are made in this paper, which focuses on analysis of the dimension reduction of high-dimensional data stream and classical clustering algorithm, pointing out the advantages and disadvantages of various algorithms.2) The feature projection reduction dimension algorithm is put forward based on the information entropy--H-HpFit Stream. Aiming at the high dimension problems for the high-dimensional data stream and the low efficiency for the existing de scending dimension algorithm, the information entropy function is introduced, which not only preserves the valuable information, but also improves the efficiency of the reduction dimension algorithm, achieving the purpose of the descending dimension. More importantly, the algorithm preserves the summary data of data flow and reduces the data storage capacity at the same time, which is convenient to extract and call data in the following research work.3) The improved clustering algorithm--D-LFStream is proposed. Aiming at the low operation efficiency problems of the LF clustering algorithm, the sliding-window processing technology, in which the density algorithm is introduced at the same time, is used to consummate the movement rule of ants, which makes the a nts more direction in the process of their moving, improving the convergence speed of the algorithm.4) An improved trend analysis algorithm of data stream is proposed. According to the actual application requirements, the TLS method or exponential regression algorithm is introduced in algorithm based on intensity change of data stream for the trend analysis, which improves the accuracy of the trend analysis and makes the trend analysis result more close to the real data. At the same time, combining with the confidence interval theory, the anomaly detection is conducted for change points in the data stream, which provides early warning and an important decision support for monitoring object.5) The bridge health monitoring is taken as the application object, and t he data stream, which has been processed by dimensional reduction and clustering algorithm, are applied to the improved trend analysis algorithm. The simulation experimental results indicate that the improved algorithms can realize the purposes of dimensio nal reduction and clustering operation of the bridge health monitoring data stream successfully and can make the trend analysis accurately. At the same time, the improved algorithms not only can improve high-dimensional data stream’s operation speed, but also can solve the storage of the huge data stream.
Keywords/Search Tags:high-dimensional data stream, feature projection, movement rule, sliding-window, trend analysis algorithm
PDF Full Text Request
Related items