Font Size: a A A

Research On Text Sentiment Analysis Via Spark And Machine Learning

Posted on:2022-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2518306347989599Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of big data era,people come to realize the importance of data and try to find the value of them.The growth of Internet usage has driven the exponential growth of text data on the Internet.How to perform sentiment analysis on these massive text data is an active research topic.Most of current text sentiment analysis methods are based on a single host approach.They fail to perform high-speed,high-volume and high-precision sentiment classification simultaneously,making it incapable of mining massive text data.The emergence of cloud computing provides a new method to perform sentiment analysis on massive text data.It could make up for the shortcomings of single-host based method in this sentiment analysis task.After a thorough review of current domestic and international research on text sentiment analysis methods,this paper combines machine learning algorithms with Spark distributed computing platform,utilizing its interactive and in-memory cluster computing features.It proposes a new idea of text sentiment analysis method on massive text data.This paper's mainly include:(1)Create a framework for data collection from massive text data.The research first uses Scrapy,a web scraping framework to extract text data from the Internet.Then it uses Apache Flume to collect log data and aggregate the data into HDFS for analysis.In this way the research ensures the reliability,timeliness and usability of data.(2)Construct a corpus of complex collocations in Chinese.The research uses the Scrapy to collect complex sentences from abstracts of academic journals,dissertations and conference papers from CNKI(China National Knowledge Infrastructure)in recent years.Based on relational collocations,this paper proposes a decision-based sentiment analysis method on complex sentences.(3)Propose a data preprocessing scheme based on Spark to reduce dimension during processing.Data preprocessing steps include data cleaning,word segmentation,stop words removal,feature extraction and vector modelling.The research proposes a Spark based process for data cleaning,which includes using Ansj for Chinese word segmentation,removing stop words based on Chinese stop word list by HIT(Harbin Institute of Technology)and using TF-IDF(term frequency-inverse document frequency)algorithm for text feature extraction.(4)Based on the review of various text sentiment analysis methods,the research uses Spark to build a SVM(Support Vector Machine)classifier,a Naive Bayes classifier and a TextCNN classifier.To improve the efficiency and effectiveness,this paper uses SVM classifier with SGD(Stochastic Gradient Descent)and multinomial Naive Bayes classifier.Meanwhile,it proposes an algorithm for sentiment classification based on Spark.(5)Utilize HDFS to realize distributed storage of text data to improve the efficiency of sentiment classification on massive text data.Combined with Spark,the method realizes parallelism optimization of sentiment classification on three classification models:Support Vector Machine,Naive Bayes and TextCNN.Comparison experiments are carried out on both single-host method and proposed method.The experiment results show that the text sentiment analysis method based on Spark distributed computing platform has advantages over single-host based method.It not only has higher classification accuracy,but also has better computation efficiency.In addition,when the amount of data is massive,the Spark based method also has higher analysis efficiency,shorter training time and higher running efficiency.This demonstrates the effectiveness of the proposed Spark based sentiment analysis method for massive text data.
Keywords/Search Tags:Text sentiment analysis, distributed computing with Spark, support vector machine, naive Bayes, Convolutional Neural Network
PDF Full Text Request
Related items