Font Size: a A A

Machine Learning For Text Analysis In Social Media

Posted on:2020-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y K WangFull Text:PDF
GTID:1368330572973709Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Web2.0 technology and online interactive applications,social media has gradually replaced traditional media and become the most important medium for people to obtain information and share information.Social media,such as Sina Weibo and WeChat,has become an indispensable part of people's lives and brought about the earth-shaking changes in people's social life.Social media contains a huge amount of social data,which has a rich application scenario As an emerging media,social media presents the following three typical characteristics.First,opinion leaders have the ability to guide public opinion.Second,information in social media are real-time but unreliable.Third,information in social media is overload.This thesis focuses on the characteristics of social media and the problems it faces,and considers the shortcomings of existing research work to deeply study thr-ee key issues in social media:(1)topic-level user influence analysis;2)bursty topic detection;(3)time-sensitive learning for topical classification in microblogs,and achieved the following innovative results:(1)To solve the problem of topic-level influencers identification in social media,the(online)topic-level influence over time model is proposed,which aims to identify the current topic-level influencers.First,the topic-level influence over time model is proposed,which models the text content,followship and the temporal information in followship,to analyze the user's temporal influence.Then,a time-decay based method is developed to measure users',influence on a given topic at a given time.More than that,the online topic-level influence over time model is proposed.The online model combines the previous two approaches into a unified model,which can analyze the users',influence in social streams while reducing the time complexity and space complexity.The experimental r-esults show that our approach can effectively find the current influencers on different topics.(2)To solve the problem of bursty topic detection in social media,we propose the bursty based topic detection model.It is observed that different topics usually present different bursty levels at the same time.To model the bursty level of topics,a bursty based topic detection model is proposed,which is a probability generative model that treats user-generated words and words' burst levels as observed variables.The model clusters words and the burst levels of words into topics to discover potential topics with different bursty levels.After that,a hypothesis testing method is utilized to identify the bursty topics by judging whether the burst levels of the detected topics are abnormal.The experimental results show that our approach can effectively discover bursty topics and is robust to the variation of the pre-setted number of topics.(3)To solve the problem of temporal stability learning for topical classification in microblogs,the temporal stability aware logistic regression model is proposed.First,the concepts of temporally stable feature and temporally unstable feature are defined.Then,six statistical methods are utilized to measure the temporal stability of features.Experiments on a 40TB Twitter dataset prove that these methods can effectively detect temporally unstable features,and the removal of temporally unstable features can effectively improve the performance of the classifier.After that,the temporal stability aware logistic regression is proposed,which learns the temporal stability of features while training the topical classifier by incorporating the idea of transfer learning and the above statistical methods.The experimental results show that our approach significantly improves the performance of the basic classifiers on topical classification.
Keywords/Search Tags:social media, text analysis, social influence, bursty topic, posts classification
PDF Full Text Request
Related items