Font Size: a A A

Tensor-based Big Data Multiple Clusterings With Their Secure And Efficient Implementations

Posted on:2020-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J ZhaoFull Text:PDF
GTID:1368330590458927Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of high-tech information technologies such as cloud computing,Internet of Things,social networking and social new media,there are a large number of sensing devices,intelligent products,network communications,and human knowledge,thinking skills,social relations and cultural elements in the real world.These produce large-scale multi-source heterogeneous data,which are characterized by mixed features,diverse modalities,and complex types,and contain different knowledge and values in different views.Multiple clusterings can generate multiple different clustering results from different perspectives,which is beneficial to reveal different structures hidden in the data from many aspects,and it is known as the key technology to solve many problems such as network public opinion analysis,major disease analysis,resource recommendation and financial risk prediction.This technology has urgent needs in social,industrial and economic fields,and has broad application prospect.Most of the existing multiple clustering researches are aimed at small-scale,single-domain datasets.The clustering results are difficult to interpret,and multi-modal clustering cannot be realized according to contextual changes.Most of the algorithms are specific to specific applications,and it is difficult to extend to other fields,even it is lack of versatility.In addition,in the era of big data,the characteristics of big data such as diverse types,large data size,uneven value density and fast growth rate also pose new challenges to the multiple clustering research in the big data environment.This thesis selects the multi-source and heterogeneous data clustering in the big data environment as the main research object,and carries out a series of theoretical,technical and method studies focusing on tense-based big data multiple clusterings and their secure and efficient implementation.The main research contents and innovations are as follows:Firstly,for multiple clusterings in big data environment,in order to measure the importance of attribute combinations in all feature spaces,a weight learning method based on multi-linear attribute ranking is proposed,and then a multiple clustering method based on selective weighted tensor distance is proposed.Besides,in order to improve the quality of clustering,on the one hand,the selected features can be completely separated from the unselected features when calculating the distance,and on the other hand,On the other hand,how to remove noise and redundancy in the data,so a tensor decomposition-based multiple clustering method is proposed.At the same time,in order to improve the performance of tensor decomposition-based multiple clustering method,a multi-relational attribute ranking method for each feature space attribute importance measures is proposed.Experiments show that the proposed multiple clustering methods have higher clustering accuracy and lower redundancy.Secondly,in the cloud computing environment,for the purpose of preserving user privacy,a secure tensor-based multiple clustering method on cloud is proposed.By researching the cloud secure computing mode of multiple clustering algorithm,designing multiple clustering analysis and service framework of cloud security under hybrid cloud model,a secure high-order density peak clustering method is proposed.Furthermore,a secure tensor-based multiple clustering method and related secure sub-protocols are proposed,and the proofs of security are provided.The experimental results show that these methods can guarantee the user's privacy security,100% clustering accuracy and high scalability and data availability,and the client is very lightweight and the algorithm is highly scalable.Thirdly,for the dimension disaster and efficient computing problems,a tensor-based multiple clustering based on tensor train decomposition and its parallel computing method are proposed.Based on the calculation rules of the basic operations of tensor in the tensor train decomposition form,the multi-linear attribute combination weight learning algorithm based on tensor train decomposition and the selective weighted tensor distance based on tensor train decomposition are proposed respectively,and then the tensor-based clustering method based on tensor train decomposition is proposed.This method can realize the complete the multiple clustering process in the form of tensor train decomposition and can guarantee or even improve the accuracy of clustering results.Moreover,in the cloud computing distributed environment,an efficient distributed parallel computing framework is designed according to the computing power and communication ability of the nodes.By studying the parallel strategy of tensor train core allocation mechanism,nuclear scheduling strategy and the parallel strategy of core operations,a distributed parallel strategy based on tensor train core is proposed to fully utilize the tensor network parallel computing advantage to improve the parallel efficiency of the tensor-based multiple clustering algorithm.Finally,in view of the large number of repeated calculations caused by the dynamic growth of big data,an incremental update method for tensor-based multiple clusterings is proposed,including incremental density peak clustering and incremental tensor-based multiple clusterings.For the tensor-based multiple clustering method,the iterative-based attribute weight increment learning method and the differential-based attribute weight increment learning method are proposed respectively.A simple and fast K-medoids algorithm is used to design the corresponding incremental K-medoids algorithm,so that it is not necessary to calculate all distances,thus effectively improving the efficiency of the tensor-based multiple clustering incremental update algorithm.The experimental results show that the proposed incremental density peak clustering has higher clustering accuracy and efficiency than the similar methods,and the proposed incremental tensor-based multiple clustering method can not only ensure incremental update,but also greatly improve the efficiency of dynamic incremental update maintenance of data in multiple clustering analysis.The proposed tensor-based big data multiple clusterings and their secure and efficient methods in the thesis can provide new useful ideas for multiple clustering theory research,and also promote the application and development of multiple clustering analysis in the era of big data.
Keywords/Search Tags:Big Data, Tensor, Multiple Clusterings, Secure Computing, Distributed and Parallel Computing, Incremental Updating
PDF Full Text Request
Related items