Font Size: a A A

The Research Of Micro-Blog New Emotion Words Recognition And Orientation Judgment Based On Word2Vec

Posted on:2017-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:H SuiFull Text:PDF
GTID:2308330488459186Subject:Information Security and Electronic Commerce
Abstract/Summary:PDF Full Text Request
In this information explosion era, network texts can reflect a lot of Internet users’emotions, views and opinions. Analysis of these texts can timely understand public opinion and guide its orientation for the government. For enterprises, it can timely find their own problems and improve their services. But the commonly used emotional dictionary now can not cover all the common emotional words. Micro-blog and other new media be more and more popular, which resulting in fragmentation of the network language, new words continue to generate and spread. It brought lots of difficulties to the emotional analysis of Internet texts.This article will be based on the recognition of new emotion words witch dose not contains in the emotion dictionary and to determine their emotional tendencies. The main research contents are as follows:1) Pretreatment of micro-blog data and emotional words identification. In order to ensure the accuracy of recognition of emotional words and recognition more emotional words, we formulated the data cleaning program, new word identification scheme, construction emoticons sentiment lexicon.2) Words’ similarity calculate and emotion words recognition based on Word Vector. The word vector is a method of representing the words in the text into the form of the space vector. In the process of transformation, it considered the relationship between words and context, which can keep more natural language information. We used Word2Vec to convert words to Word Vectors, and then calculate the distance of words in a multidimensional space and identify the similarities between words. Then we distinguish emotional words and their tendencies according to part of their most similar words.3) Recognition of emotion words combine words’ co-occurrence and word vectors’ similarity. Words’co-occurrence and word vectors’similarity are two different ideas in emotion words recognition. Words’ co-occurrence considers the co-occurrence frequency between word with positive seed words and negative seed words. Word vectors’similarity converts word into a vector form, and then judgment the tendency of word according to the similarity between the word and emotional words. We combine these two methods together, mainly based on word vector similarity, supplemented by the method of word co-occurrence. Then we filter low-frequency words and low credibility results. This method can get high accuracy while finding more emotional words.In summary, we proposed an emotion words recognition and orientation judgment method based on large-scale micro-blog data. We used Word2Vec to convert words to Word Vectors, calculated the words’ distances, and then get new emotion words according words’similarity. Then we combined two kinds of different emotional word recognition method together,which can get high accuracy while finding more new emotional words.
Keywords/Search Tags:word vector, word similarity, word co-occurrence, emotion recognition
PDF Full Text Request
Related items