Font Size: a A A

Application Research On Sentiment Classification Model Of Text Based On Spark

Posted on:2016-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:P W ChenFull Text:PDF
GTID:2308330461956031Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of cloud, the word of big data is increasingly being mentioned and recognized, more and more people are increasingly aware of the importance of the data and try to tap the hidden value in them. Big data is generally used to describe and define the huge amounts of data generated by the era of information explosion, social networking is one of the massive data representatives. Twitter, Facebook, Sina microblogging, Wechat and other social media networks store the vast amounts of user nodes, each user node is also stored a large number of individual social and interactive publishing information, with the growing popularity of mobile Internet applications, these data all the time constantly change and update is showing growth spurt, which has the characteristic features of big data. With a high degree of user involvement in social networks, users can quickly and easily share personal information, acquire information and communicate with others. Because of its influence and the breadth and depth of communication, that people express their feelings and opinions in a social network is increasingly common. The underlying flow of emotional information seems to be finely divided and disorderly, but there is a huge value waiting to be discovered lying behind it, which makes the study of sentiment analysis of massive data is of great significance.However, the traditional text sentiment classification studies were performed on a single machine, the traditional analysis algorithm difficult to quickly complete sentiment classification tasks of large amounts on data social networks, its time efficiency and scalability has become a bottleneck, so we need to study the calculation mode which is suitable for mass data sentiment classification tasks. The emergence and development of cloud computing provides a new solution to solve emotional classification tasks of massive data. It makes up for the deficiencies of traditional stand-alone computing, by building the distributed sentiment classification algorithms and distributed architecture, the feasibility of emotion of the massive data classification task has been enhanced.This paper analysed the existing text sentiment classification technique based on the combination of cloud computing technology, discussed whether building a text sentiment classification based Spark model suitable for massive data is feasible. According to the characteristics of massive data text information, the text word-level emotional polarity identification that we design is used to establish of a broader coverage emotions dictionary. With the text emotion feature extraction and weighting, and the Spark of parallel computing model, we build a distributed Native Bayesian classification model to handle the emotional scale data, and for emotional features of text information which is not obvious, we analyze the text grammar and sentence links between related feature, construct a SVM text sentiment classification model. As sentiment analysis based on cloud computing technology needs to collect vast amounts of data to be verified, we use the network packet analysis, simulated landing, crawl and analytical data to obtain data as our experimental data which is used to validate the sentiment classification model. The experimental results showed that the model constructed in this paper can be better for text sentiment classification of massive data, it achieve a more satisfactory classification results and time efficiency which is more feasible for processing vast amounts of text information.
Keywords/Search Tags:sentiment classincation, Spark, sentiment character, RDD
PDF Full Text Request
Related items