Font Size: a A A

Automatic Construction Of Mental Health Dictionary

Posted on:2020-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:D B WangFull Text:PDF
GTID:2404330572981775Subject:Engineering
Abstract/Summary:PDF Full Text Request
Mental health problems such as suicide threaten people's health and the harmonious development of society,early detection of suicidal tendencies provides the basis for early intervention.In life,people who have suicidal thoughts often do not seek social help,but they always express suicidal thoughts in their words.With the rapid development of social media,people are more and more fond of expressing their own thoughts and feelings in social media,the speech information with personal emotions based on Sina-Weibo is developing rapidly,deep understanding and mining of this information will provide support for the research on suicide risk.The study found that there is an important relationship between the expression of suicidal tendencies in social media and the language patterns of users.Some words have important indications for suicidal tendencies.This studies refer to these words as suicide clue words.In the study of sentiment analysis,it has been verified that sentiment lexicon contributes to sentiment analysis.Similarly,constructing a suicide clue dictionary is of great significance for suicide tendency analysis.At present,there are very few studies on the construction of suicide dictionary,and the corresponding dictionary resources are far less abundant than the emotional dictionary,a Chinese suicide clue dictionary built by the Institute of Psychology of the Chinese Academy of Sciences is the only one that can be found in the current literature.However,the suicide clue dictionary is based on the artificial construction method.It needs to read the large amount of relevant corpus and combine the expert domain knowledge to select the initial suicide words,and then manually optimize and supplement.Manually constructing a suicide clue dictionary requires a lot of human and financial resources.There are two shortcomings in the existing suicide dictionary:(1)Online words are constantly updated,and new clues cannot be updated into the dictionary in time.(2)The dictionary contains only words,but no phrases.Many words alone have no suicidal tendency,and when combined,there is a certain suicidal tendency.We proposes an automatic construction method of suicide clue dictionary,which is divided into two categories: seed-based methods and dataset-based methods.In the automatic construction method of the suicide clue dictionary based on seed words,the words with large semantic similarity to the suicide clue dictionary constructed by the Chinese academy of sciences were found in the suicide text data set as suicide clue words,mainly using Word2 vec and point mutual information(PMI)and other methods.It mainly uses Word2 vec and point mutual information(PMI).The dataset-based suicide clue dictionary automatic construction method mainly adopts TF-IDF,LDA topic model,information gain,and classification-based method by extracting important words or subject words in the suicide data set as suicide clue words.In order to make the suicide clue dictionary more complete,association rule mining method and N-gram method are also used to construct suicide clue phrases automatically.In order to verify the effect of the dictionary,direct evaluation and indirect evaluation are designed.The former uses the Chinese Academy of Sciences artificial suicide clue dictionary as a reference to evaluate the recall rate and average accuracy rate of the auto-constructed dictionary to the Chinese academy of sciences' suicide clue dictionary.While the latter examines the extracted suicide clue words to help suicide classification tasks.From the evaluation index,in the direct evaluation method,the method of Word2 vec and PMI based on the seed word method has a recall rate of 0.0824 and 0.0495 for the Chinese Academy of Sciences,respectively,indicating that the traditional emotional dictionary expansion or extraction method is not fully applicable to the suicide clue dictionary construction.Compared with the TF-IDF method and the classification-based method,the LDA theme model has a recall rate of 0.313 for the Chinese Academy of Sciences suicide clue dictionary,and the average correct rate of recalled words is 0.323.In the indirect evaluation method,the average accuracy rate of the suicide clue dictionary constructed based on the seed word method is lower than that of the Chinese Academy of Sciences suicide clue dictionary.In the dataset-based method,compared with the TF-IDF method,the LDA topic model method,and the classification-based method,the average accuracy of the information gain method on the suicidal tendency classification task is 0.9525,which exceeds the Chinese Academy of Sciences suicide clue dictionary.Average accuracy of 0.8969.After adding the suicide clue phrase,the effect in the indirect evaluation method was slightly improved,from 0.9525 to 0.9671,which proved the validity of the automatically constructed suicide clue dictionary.
Keywords/Search Tags:Automatic dictionary construction, suicide clue words, suicide clue phrases, dictionary evaluation
PDF Full Text Request
Related items