| Hostility is a kind of negative attitude towards others.It manifests itself as distrust,suspicion,dissatisfaction,resentment and intention to harm others.Hostility is not only bad for the physical and mental health of individuals,but also has a series of negative effects on others and society,such as causing behavioral problems(suicide,aggressive behavior,destructive behavior),affecting social stability,leading to social loss,etc.With the development and popularization of Internet technology,hostility will spread and spread on the Internet,causing a wider range of negative effects,leading to online violence or online mass incidents,threatening people’s health and safety,and even affecting national stability,unity and social harmony.Therefore,the study of hostility,especially hostility in the network,has important theoretical value and practical significance.The measurement and evaluation of hostility is the basis and starting point of hostility research.Therefore,this study aims to evaluate the level of hostility of people through their expressions of hostility on the Internet,so as to reveal the high-risk and susceptible groups of hostility,and provide reference for the prevention and intervention of hostility on the Internet.To study people’s cyber psychology through their verbal expressions on the Internet is the basic method of current cyber psychology research.The words people use in their verbal expressions can provide clues for the study of their psychological process.Microblog is one of the largest social platforms in China,which stores a large amount of text materials for research.Therefore,this study evaluates the level of hostility of microbloggers based on their posts.Specifically,this study will be collected by means of content analysis to establish the classification of micro blog this machine learning model,the main function of the model is to identify hostile rhetoric with the hostile rhetoric,and then through the model calculation weibo users in a certain period of time frequency of hostile comments,as indicators to determine the user’s level of hostility.In order to achieve the above research objectives,the following two studies were carried out:Study 1 attempted to establish a network hostile speech recognition model,and required that the model could distinguish hostile speech and non-hostile speech with an accuracy rate of more than 80%.The specific research process is as follows.The first step is to form a "network hostile speech coding table" on the basis of literature and micro-blog text analysis,to provide a basis for subsequent content analysis.The second step is to use web crawler technology to collect the comment text of 100 popular microblogs on November 1,2020,solstice and November 20,with a total of 12,000 pieces.The captured data information includes user ID,user name and comment content.The third step is to conduct content analysis on the collected microblog texts according to the "Network Hostile Speech Coding Table",and take the analysis results as an indicator to test the validity of the "Network Hostile Speech Recognition Model".Then the microblog text is preprocessed(text regularization,text segmentation,stop word filtering).In the fourth step,two machine learning methods,logistic regression algorithm and support vector machine,were used to train the model to establish a "network hostile speech recognition model".Meanwhile,the recognition results of hostile speech in microblogs obtained by the recognition model were compared with the results of manual content analysis,and the accuracy was taken as the evaluation index.The results of Study 1 showed that the raters’ consistency reliability(Cohen Kappa coefficient)of the hostile speech coding table was 0.80,indicating that there was significant consistency between different individuals in using the Internet hostile speech coding table to judge hostile speech and non-hostile speech.The text data that can be used to build the machine learning model are 4837 hostile speech and 5178 non-hostile speech respectively.The comprehensive accuracy of both logistic regression algorithm and support vector machine was 89%,and the accuracy of logistic regression algorithm for hostile speech recognition was 93%,higher than the accuracy of support vector machine(92%).Therefore,the network hostile speech recognition model established by logistic regression algorithm is selected.In study 2,the "Internet hostile speech recognition model" established in study 1 was used as a tool to measure the hostility of Internet users,and the calibration validity of the measurement was tested.The specific research process is as follows.From November 2,2020 to November 20,2020,a total of 166 Weibo users were recruited to fill in the questionnaire in Study 2,and 137 questionnaires were valid in the end.Among them,there were 14 males and 123 females,with an average age of 21.47 years old(SD=2.09).102users(74.5%)used Weibo every day.To ensure that everyone’s results were comparable,the number of texts per user was kept at 50,resulting in a total of 6,850 tweets collected.Each Weibo user is anonymized before data processing.Online hostile speech recognition model was used to evaluate whether each Weibo user’s microblog was originally hostile speech.If it was hostile speech,it was rated as "1",otherwise,it was "0".Finally,according to the evaluation results,the mean value of each user’s hostility was calculated.The results of study2 showed that an independent sample t-test of users with high and low levels of hostility obtained by the model showed a significant difference(t=20.89,p <0.001),indicating that the tool can effectively identify users with high and low levels of hostility.At the same time,the correlation coefficients between hostility scores and the hostility and verbal aggression subscales of Buss-Perry aggression questionnaire and the short Cook-Medley hostility questionnaire were 0.26,0.27 and 0.30(PS<0.01);The correlation between the hostility score of Weibo users measured by the network user hostility measurement tool and the hostility score obtained by manual content analysis was 0.70(p<0.01).These results suggest that the validity of the tools for measuring Internet user hostility meets the prevailing standards of psychometrics.In conclusion,this study used the microblog text to establish a network hostile speech recognition model that can effectively identify the hostile speech in microblogs,and this model can also be used as a tool to measure the hostility of network users and to evaluate the level of hostility of Internet users. |