Font Size: a A A

Big Data Sentiment Analysis Based On Plsa And Its Application

Posted on:2015-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiangFull Text:PDF
GTID:2308330473953175Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of web2.0, there are increasing number of users posting content to the Internet, including product reviews and blogs in which users express their opinion. Mining users’ sentiment information from the reviews and blogs has potential commercial value. On one hand, people are able to learn and compare products or services they concern by reading corresponding reviews and blogs that are subjective or emotional, and then make the appropriate purchasing decisions. On the other hand, vendors can also adjust strategy to target buyers and improve the quality of the products or services. All these effects can be observed through product sales prediction.It is a big challenge to mine the sentiment and opinion from large-scale product reviews. For one reason, it is inappropriate to simply classify a review as positive or negative in a traditional way, since people usually express their opinion or sentiment tactfully, and the sentiment sometimes is very complicated. In order to capture the sentiment information from reviews, Probabilistic Latent Semantic Analysis(PLSA) is used. The other challenge is the huge scale of the training data. It has very high time and space complexity to train PLSA model on big data. Researchers have been trying to solve this problem using parallel means. But their approaches only partially reduce the time complexity, the main memory in the compute process still need to load a large amount of data. Therefore, in order to solve the scalability problem of data, we modify the traditional EM algorithm in mapreduce framework and train the model in a cluster in a parallel way. The main memory in each computer just needs to load part of the dataset. This method can reduce time and space complexity simultaneously. Results show that this method can deal with large datasets efficiently and almost linear speedup can be achieved.The value of mining sentiment from reviews or blogs can be reflected by sales prediction. In this paper, a movie sales prediction model- auto regression based on sentiment(ARBS) is build by combining the sentiment information in PLSA model mined from blogs with auto regression model based on previous box offices of movies. Based on the ARBS model, we consider further the quality and quantity of the blogs and build a new model ARBS-i, an improved version of ARBS. And a set of experiments were performed to test the prediction accuracy. Results show that the Mean Absolute Percentage Errors of the models ARBS and ARBS-i are 6.7% and 8.5% less than the auto regression without sentiment information respectively. It proved the effectiveness and superiority of the method proposed in this paper. It provides a solution to utilize users’ sentiment information for commercial application.
Keywords/Search Tags:sentiment analysis, Auto Regression, Sales prediction, Probabilistic Latent Semantic Analysis(PLSA)
PDF Full Text Request
Related items