| In recent years,the network has become more convenient and the speed of dissemination of information has also risen dramatically.Low attention events often appear before the public.Because of the promotion of public awareness,some "plagiarism" incidents have been revealed by volunteers continuously.The heat of the incident continues to rise via the spread of the media.Under such circumstances,more and more people are willing to study the whole events,and the most common study of such events is to analyze this phenomenon caused by rights protection or the improve of relevant laws.However,once an incident occurs,there must be a third party involved.This is not just a matter of originator and plagiarist.As a third party,the attitude of the masses has an impact on the development of the incident.Therefore,the research object of this paper is mainly the third party.The paper chooses the text of "plagiarism" comments in ZHIHU for emotional analysis.The comments text was grabbed by the Beautiful Soup in Python.The software has received 7601 review texts.In order to ensure the integrity of the whole analysis process,we need to construct the classifier,and also need to evaluate the advantages and disadvantages of the classifier.First of all,the training set and the test set are separated from the whole data set,and the relevant emotion dictionary is constructed for the text data in the training set.After repeated adjustment,766 emotion words are finally determined.The emotion word dictionary contains two variables,one is the determined emotion word,and the other is the emotional score.In order to reduce the probability that the emotional score is 0 due to the positive and negative balance in calculating the finalemotional score,the emotional word scores are adjusted.The score range is [-5,5],and scores do not contain 0.Next,clean the text that is removing punctuation,letters and numbers.Then,import a customized dictionary that contains the proper nouns and emoticons.Based on a customized dictionary,the segment CN()function in the Rwordseg package in R is selected to segment the text.Finally,it is needed to calculate the emotional score.In this part,the word segmentation result needs to match the sentiment dictionary to determine the emotional tags.Sentiment analysis results often depend on the result of the training set while using the machine learning algorithms.In the end,this paper chooses Random Forest and Naive Bayes to construct the classifier.And the classifier is formed by the sparse matrix which is constructed by the segmentation results and the TFIDF index.From the model diagnosis results,naive Bayes classifier recall rate is higher than the random forest and dictionary classifier. |