Font Size: a A A

Research On Deep Computation Model For Big Data Feature Learning

Posted on:2016-07-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q C ZhangFull Text:PDF
GTID:1318330482467190Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet of Things, social networks and electronic business, the era of big data has come. While big data offers the great potential for revolutionizing various aspects of the society such as enterprises, educational services and medical, learning the features of big data and harvesting of valuable knowledge from big data is a very challenging task, which requires both the development of advanced technologies and interdisciplinary team working in close collaboration.Feature learning is a fundamental topic in big data analytic and mining. Many challenges are posed on feature learning by the characteristics of big data:volume, variety, and velocity, which refers to large scale of data, different types of data, and the speed of streaming data, respectively. Aiming at these challenges, the paper studies the deep computation model for learning features of big data in-depth. In detail, the major content of this paper includes the following four points:(1) A basic deep computation model based on tensor data representation. Current deep learning models are difficult to learn the features on heterogeneous data. Aiming at this problem, the paper proposes a deep computation model for learning the features on big data. A basic deep computation model is designed for learning the features of big data by extending the deep learning model for the vector space to the tensor space. To train the parameters, the paper proposes a high-order back-propagation algorithm. The tensor distance is used as the average sum-of-squares error term of the reconstruction error to characterize the distribution of heterogeneous data in the tensor space. The theoretical analysis demonstrates that deep computation model is an extension of deep learning model. Experiments show that the proposed model can learn features from the heterogeneous data effectively.(2) An incremental deep computation model. The basic deep computation model cannot adjust its parameters and structure for learning the features on the new arriving data in real-time. Aiming at this problem, the paper proposes an incremental deep computation model. First an incremental tensor auto-encoder (ITAE) is developed when new samples are available. Afterwards, several ITAE modules are stacked to form an incremental deep computation model. The theoretical analysis demonstrates that the proposed model meets the conditions of incremental learning:namely increment, adaption and preservation. Experiments show that the proposed model can learn the features on dramatical data efficiently and satisfy the real-time requirement of big data learning.(3) A privacy preserving deep computation model on cloud. To protect the private data when training deep computation model with cloud computing, the paper proposes a privacy preserving deep computation model on cloud. A secure high-order back-propagation algorithm by combining with the full homomorphic encryption is devised for training the parameters of the deep computation model on cloud. Experimental results show that our scheme can securely train deep computation model for big data feature learning using cloud computing. More importantly, our scheme is highly scalable by employing more cloud servers, which particularly works for big data.(4) A high-order possibilistic clustering algorithm based on deep computation model. Current clustering methods are difficult to cluster incomplete big data effectively. Aiming at this problem, the paper proposes a high-order possibilistic clustering algorithm based on deep computation model. The basic auto-encoder model is improved to learn features from each type of incomplete data separately. Next, the vector outer product is used to fuse the learned features to model the nonlinear correlations over different modalities, which aims at forming the joint representation of big data. Finally, a high-order possibilistic c-means algorithm (HOPCM) is implemented for clustering the big data in the tensor space. Experimental results demonstrate that HOPCM can learn the features on incomplete big data. More importantly, HOPCM is able to cluster both high-quality data and incomplete big data effectively.
Keywords/Search Tags:big data, feature learning, tensor, deep computation model, privacy preserving, cloud computing, possibilistic clustering
PDF Full Text Request
Related items