Emotional Aanalysis Of The Comments On "Plagiarism" In ZHIHU

Posted on:2019-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2348330542494042

Subject:Applied statistics

Abstract/Summary:

In recent years,the network has become more convenient and the speed of dissemination of information has also risen dramatically.Low attention events often appear before the public.Because of the promotion of public awareness,some "plagiarism" incidents have been revealed by volunteers continuously.The heat of the incident continues to rise via the spread of the media.Under such circumstances,more and more people are willing to study the whole events,and the most common study of such events is to analyze this phenomenon caused by rights protection or the improve of relevant laws.However,once an incident occurs,there must be a third party involved.This is not just a matter of originator and plagiarist.As a third party,the attitude of the masses has an impact on the development of the incident.Therefore,the research object of this paper is mainly the third party.The paper chooses the text of "plagiarism" comments in ZHIHU for emotional analysis.The comments text was grabbed by the Beautiful Soup in Python.The software has received 7601 review texts.In order to ensure the integrity of the whole analysis process,we need to construct the classifier,and also need to evaluate the advantages and disadvantages of the classifier.First of all,the training set and the test set are separated from the whole data set,and the relevant emotion dictionary is constructed for the text data in the training set.After repeated adjustment,766 emotion words are finally determined.The emotion word dictionary contains two variables,one is the determined emotion word,and the other is the emotional score.In order to reduce the probability that the emotional score is 0 due to the positive and negative balance in calculating the finalemotional score,the emotional word scores are adjusted.The score range is [-5,5],and scores do not contain 0.Next,clean the text that is removing punctuation,letters and numbers.Then,import a customized dictionary that contains the proper nouns and emoticons.Based on a customized dictionary,the segment CN()function in the Rwordseg package in R is selected to segment the text.Finally,it is needed to calculate the emotional score.In this part,the word segmentation result needs to match the sentiment dictionary to determine the emotional tags.Sentiment analysis results often depend on the result of the training set while using the machine learning algorithms.In the end,this paper chooses Random Forest and Naive Bayes to construct the classifier.And the classifier is formed by the sparse matrix which is constructed by the segmentation results and the TFIDF index.From the model diagnosis results,naive Bayes classifier recall rate is higher than the random forest and dictionary classifier.

Keywords/Search Tags:

Emotional Analysis, Emotion Dictionary, Random Forests, Naive Bayes, R

Related items

1	Study On The Application Of Hierarchical Bayesian In Emotional Classification
2	Research On Sentiment Analysis Methods Based On Big Data
3	Analysis Of Chinese Paragraphs Emotion Based On Naive Bayes
4	Research On Analysis Of Emotional Tendency Based On Tieba Text
5	Research And Application Of The Construction Of Chinese Weibo Emotional Dictionary
6	A Study On Emotional Tendency Of Chinese Microblogging Based On Conditional Random Field And Emotional Dictionary
7	Design And Implementation Of Sentiment Analysis System Based On Native Bayes Algorithm
8	Research On Personal Emotion State Based On The Text
9	Research On Computer-aided Diagnosis Of Common Congenital Heart Diseases
10	Sentiment Analysis Based On The Combination Of Dictionary And Machine Learning