Font Size: a A A

Research On Sentiment Classification Of Network Short Text Using Deep Learning On Spark Platform

Posted on:2017-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:B ShangFull Text:PDF
GTID:2348330503470123Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays, the development of Internet promotes the advent of the big data era, and the information produced by the network of social media has explosively increased. Through the integration and analysis of the information, the psychological needs of the public can be acquired and the public opinion can be grasped. According to the current situation, the sentiment classification of the network short texts using text mining, deep learning and distributed parallel computing technology has been carried out in this thesis. The main works include:(1) Aiming at the problems that the feature words are independent and the vector is high-dimensional sparse while representing the short texts using traditional text vector space model, the CBOW model of the Word2 Vec is introduced in the thesis. A multidimensional distributed word vector set is obtained through training a large number of sample data. The existing emotional dictionary has been expanded by calculating the distance between word vectors to obtain synonyms. And representation of the network short texts is implemented by the Word2 Vec.(2) In view of the problems of existing shallow structure learning algorithms, such as limited representation ability of complex function and inadequate capability of generalization, the deep belief network classification model is constructed based on deep learning. The feature vector conversion is implemented using multi-layer unsupervised Restricted Boltzmann Machine first. Then, the error feedback is realized and the sentiment classification of the network short text is completed by a supervised BP neural network. Finally, the experiments are carried out and show that the deep belief network has better capability in feature extraction, and the classification results are satisfactory.(3) For massive text data, HDFS is used to realize the distributed storage of the web text data in order to improve the efficiency of the sentiment classification. The text preprocessing and the parallel optimizing of deep belief network are implemented using Spark. The experiments show that the distributed deep belief network can greatly reduce the training time and accelerate the computing speed.The network short text sentiment classification system is designed and implemented, which mainly includes the data acquisition module, the data preprocessing module, the sentiment classification module and classification results visualization module. The research results are applied in this system and the effectiveness of the proposed method is verified.
Keywords/Search Tags:Network short text, Sentiment classification, Word2Vec, Deep belief network, Spark parallel computing
PDF Full Text Request
Related items