Font Size: a A A

Research On Conversation Extraction And Analysis Of Short Text Message Stream

Posted on:2016-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:T C LiFull Text:PDF
GTID:2308330482479201Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, network applications such as instant message, forum and micro-blog, which satisfy users’needs of communication, are rising quickly. They provide a convenient way for information sharing and knowledge dissemination among users. There are a large number of short text message streams produced in these network applications, most of which are about users’descriptions and reviews of social life, including sentiment orientation towards the topics users are talking about, as well as material containing users’ identity information. Therefore, the effective processing of short text message stream, which contains analysis of users’sentiment orientation towards topics and users’identity information, will help government departments explore public opinion and make the correct guidance of public opinion. This dissertation makes deep research on conversation extraction and analysis of short text message stream, including short text clustering, conversation extraction of short text message stream, sentiment orientation analysis of conversation, user profile in short text message stream. The main contributions are listed as follows:(1) Short text in the Internet media has the property of sparse feature and non-standard language, which result in unsatisfactory results in short text clustering. An improved short text clustering algorithm is proposed in order to solve this problem. Firstly, a feature weight calculation method is defined to compute the weight of every word in clusters to get the keywords of each cluster. Then, word vectors are used to compute semantic similarity between keywords in order to obtain similarity of clusters. Finally, clustering is achieved based on the improved short text hierarchical clustering algorithm. Experiments conducted on four different datasets show that the proposed method outperforms traditional clustering algorithms, the macro-F arrive at 63.80%、72.3%、61.5% and 84.7%, which demonstrate the effectiveness of the proposed method.(2) Traditional methods for conversation extraction of short text message stream are usually affected by sparse feature in computing content correlation. A novel conversation extraction algorithm based on "divide and cluster" is presented in order to solve this problem. Firstly, the short text message stream is segmented into conversation segments based on content, temporal and user connection. Then, the improved Single-Pass clustering algorithm is used to cluster the conversation segments to complete the conversation extraction. Experimental results on 3 datasets show that this method can effectively improve the performance conversation extraction.(3) Messages in conversation are usually short and often contain incomplete syntactic structure, which result in poor performance in sentiment orientation analysis with traditional methods. A novel unsupervised method for sentiment orientation analysis of conversation is put forward in order to solve this problem. Firstly, word vectors and sentiment dictionary are used to compute the sentiment orientation score of words. Then, sentiment orientation of each message in the conversation is computed. Finally, user’s sentiment orientation towards the topic of conversation is obtained by analyzing the sentiment orientations of his messages. Experimental results show that this proposed method can effectively identify the sentiment orientation of different conversations, the average F-Measure reaches 83.3% by average, and the best result arrives at 97.6%, which demonstrate the effectiveness of the proposed method.(4) Inspired by the principle of training word vector, a user profile method for short text message stream is proposed by taking the features of short text message stream into account. Firstly, the messages sent by the same user in all conversations are merged together as the user data. Then, the user data is divided into fixed-length word chains, user marks are added at the splits to create context relationship, and external data is brought in to constitute the training data. Finally, the Skip-gram model is trained on the training data to obtain user vectors as the results of user profiling. Experimental results show that user keywords extraction and user clustering based on user vector work better than the traditional methods, which demonstrate the rationality and effectiveness of the proposed method.
Keywords/Search Tags:Short Text Clustering, Word Vector, Short Text Message Stream, Conversation Extraction, Sentiment Orientation Analysis, User Profile
PDF Full Text Request
Related items