Font Size: a A A

Research On Industrial Data Stream Oriented Parameter Adaptive Real-time Clustering Algorithm

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2428330605960603Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the 5G era and the widespread utilize of various terminal applications,network transmission medium are always flooded with massive amounts of high-speed data,which contains a lot of valuable knowledge.As an important method and technology of data mining and knowledge discovery,dynamic clustering for data stream has become a research hotspot.How to cluster the samples in data stream of thermal power production and monitor the production conditions real-time is a research content with practical significance.To achieve the goal of this task,it is necessary to deal with the concept shift,anomaly detection,real-time analysis,and parameter adaptation in the data flow environment.Parameter adaptation runs through the entire process of dynamic cluster analysis.The ability of an algorithm to adaptively adjust its parameters largely determines its performance in processing streaming data.At present,there are few researches and attempts to apply the clustering analysis method directly to industrial production process monitoring.Most of them adopt supervised learning to model and learn the collected historical data.After then,match and monitor the new arrived data is matched and monitored,and abnormal data can be captured.However,many unlabeled factors in the real environment will bring challenges to the model's adaptive capabilities,such as equipment aging,heat supply adjustment,sudden changes in the natural environment(temperature),etc.,the annual working conditions,and even the same quarter and same period of the year's situation also may be different.Knowledge discovery using data itself,adaptive adjustments to data trends,and accurate mining of deep knowledge are the tasks that clustering methods for data flow environments need to do.The focuses of our work are how to make the algorithm adapt to the data flow environment better in real time for timely and accurate parameter adjustment,or expand the range of parameters,reduce the sensitivity of parameters,and improve the practical application ability and availability of the algorithm on the basis of theoretical innovation.The main contributions of this article are as follows:(1)In order to reduce the time and space complexity of clustering analysis and improve the accuracy and efficiency of the algorithm,we proposes a principal component analysis method based on dimension reduction window(DRWPCA)in this paper to select the main features of high-dimensional data,that is,dimensionality reduction.We designed a dimension reduction window that adaptively adjusts the width based on the dimensional information of the data to limit the scope of feature analysis and reduce human intervention.In addition,it refines the dimension reduction process,iteratively analyzes the main features,and retains the direct information of the data to facilitate the analysis of subsequent processing results.(2)Through the analysis of the change and distribution characteristics of the production data in thermal power industry,the clustering algorithm based on density can adapt to its change trend and distribution better than other kinds.The static clustering method is the basis of the dynamic clustering algorithm for data flow.Firstly,we improve the static clustering approach based on a specific problem.We have proposed several static clustering algorithms successively such as FKDC,DC-SKCG,KNNGPC.The basic idea of FKDC is to divide the whole cluster into several sub-clusters.According to the local spatial information of suspected outliers,the sub clusters are automatically fused,so that the method based on partition can discovery irregular shape clusters,and the range of initialization parameters is wide.In addition,the concept of suspected outliers refines the role of non-core member samples,and improves the clustering performance by analyzing the suspected outliers in the clustering process.DC-SKCG assigns an adaptively adjusted truncation radius to each sample based on the local density of each point,and uses high-density points that are more than the number of real clusters as the starting point for density clustering to avoid inter-cluster conflicts as much as possible.If conflicts still occur between clusters,we have designed a conflict game look-back mechanism based on the shared K nearest neighbor similarity.The high-density center is used as the core area for member point contention to improve the accuracy of the clustering algorithm.It is proved that the algorithm is less sensitive to K value.KNNGPC combines the advantages of the former three and makes some further innovates,proposes the concept of KNN gravitation center to find the local cluster center,and utilizes it as the starting point of clustering to carry out the extended clustering based on the shared KNN similarity.In the process of clustering,it can automatically fuse redundant clusters without setting the number of clusters artificially and can find any shape clusters,which has good parameter adaptive ability and universality.(3)On the basis of our static clustering algorithm,we improve it and propose data stream oriented versions.First of all,the theory of data stream oriented clustering approach is studied and FKDStream algorithm is proposed.A weighted method based on KNN density and considering the relationship between time,space and data quantity is designed to reduce the weight of long-standing or non-core points and reduce their impact on current clustering.A width adaptively adjusted sampling window is designed to improve the efficiency and reduce the data lose rate in stream.Suspected outliers are proposed as a new concept,and the accuracy of the algorithm is improved by filtering them.In addition,on the basis of KNNGPC,a KNN-GPStream oriented to data flow is proposed.The data range to be analyzed is delimited by the width adaptive sliding-attenuation window model to adapt to adjust the amount of data to ensure the timeliness of the algorithm.In order to meet the actual engineering needs,a hierarchical snapshot model is designed,which can save enough summary information without occupying too much space.In summary,our work is a complete research process from preprocessing to data flow clustering algorithm design,according to the needs and problems of each sub module,a new solution is proposed.In terms of parameter adaption,we mainly solve the problem of adaptive adjustment of window model width in dimensionality reduction,adaptive adjustment of truncation radius in density based clustering algorithm,self-discovery of cluster number without human intervention,reduction of algorithm sensitivity to relevant parameters,and adaptive adjustment of parameters in data flow environment in the face of concept shift and other situations Problem.The purpose of solving these problems is to improve the accuracy,timeliness,and robustness of the clustering algorithm in the data stream environment.Related experiments have also proved that our algorithm proposed in this paper is effective and can solve practical problems.At the same time,by comparison with other algorithms,the experimental results also prove that our algorithm has advantages in various aspects.
Keywords/Search Tags:dimension reduction, industrial data stream, fuzzy clustering, density-based clustering, parameter adaptation
PDF Full Text Request
Related items