Font Size: a A A

Research Of Sentiment Analysis In Text Based On Spark

Posted on:2018-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:L H WangFull Text:PDF
GTID:2428330515499960Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the rapid development of social media platforms has led to the rapid growth of data.It has become a research hotspot to use these data to find useful sentimental information and make a sentimental classification on massive text data.The traditional method is to implement the classification algorithm to extract valuable sentiment information in a single machine,but it has the disadvantages.It not only requires a lot of execution time,but also cannot meet the requirements of the growing data processing in the single machine environment.The emergence and development of cloud computing provides a new method to solve the sentiment classification task under the massive data.It makes up for the shortcomings of single machine and meets the needs of massive data processing.Based on the research of Spark platform sentiment analysis in text,this thesis improves Naive Bayesian method in traditional algorithm,aiming at improving the efficiency and accuracy of text classification algorithm.The main contents of this thesis include:(1)The improvement of naive Bayesian algorithm.Based on the analysis of the traditional Naive Bayesian classification algorithm,this thesis summarizes the advantages and disadvantages of the algorithm,and researchs the effect that the Naive Bayesian conditional attribute independence assumption.Then this thesis propose Euclidean distance weighted Bayesian classification algorithm,using the Euclidean distance optimization attribute weights to do the improvement of the algorithm.(2)Parallelization of Naive Bayesian algorithm.Based on the Spark cluster environment,we implemented the algorithm in parallel.Experimental results show that the improved naive Bayesian algorithm based on Spark's cloud computing platform can effectively improve the classification efficiency of the algorithm,greatly reduce the execution time of the training,and improve the accuracy rate than the traditional algorithm.
Keywords/Search Tags:Sentiment Classification, Spark, Parallelization, Naive Bayesian
PDF Full Text Request
Related items