Font Size: a A A

Research On Depression Prediction Of Micro-blog Users Based On Word Embedding Method

Posted on:2018-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y FangFull Text:PDF
GTID:2348330542492594Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As a mainstream social media tool in china,micro-blog is a platform for individual users to express their views and feelings.Through investigation,people with mental disorders are different with normal people in language using,emotional expression.This thesis obtained the micro-blog data of the people with and without depression by using web crawler.The individual information features and linguistic features are extracted from the user data.Two methods are proposed in this thesis to build classifiers:First one is feature statistic method based on extended depression dictionary.Second is constructing user vector with word embedding method.Word embedding is obtained by word2vec developed by google.The main work and contributions of this thesis are as follows:(1)Previous work on predicting depression is based on the basic emotional dictionary.Through investigation,lots of depression words are not included in emotional dictionaries.According to the lack of the depression words,a depression dictionary is constructed.This thesis extracts 54 representative words as the basic depression words.According to the similarity of the word embedding,the related words of the seed words are obtained.The depression dictionary is constructed with these related words.Based on the dictionary,a predict model is constructed.(2)As word embedding contains the context information,this thesis proposes constructing the user vector with the word embedding.This thesis uses the TF-IDF weighted word embedding method and the max pooling method to build the user document vector respectively.The TF-IDF method takes account of the importance of words in user documents,and gives high weight to the words with high importance.The max pooling method filters the unimportant information in the sentence level.The document vectors obtained by the two methods both retain the semantics of the user documents.The word embedding construction method can learn the features automatically.Compared to the traditional method,this method omits the procedures of feature extraction and feature simplification.This thesis takes the user vector as the input of the classifiers.Results indicates that this method can be a new solution of the user depression prediction.
Keywords/Search Tags:Micro-blog, depression prediction, depression dictionary, word embedding
PDF Full Text Request
Related items