Font Size: a A A

Research On The Contextual Cohesion Of Social Media Texts For News

Posted on:2018-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q H YangFull Text:PDF
GTID:2348330518983393Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet makes the traditional news media(newspapers,television,etc.)and the new social media(blogs,micro-blogs,forums,Twitter,etc.)to form a symbiotic and complementary relationship.The news media provides an accurate,objective and in-depth coverage of a hot event and topic in a professional perspective.At the same time,as a web platform social media provides people with the ability to express and share their views on the hot events in a timely and quick manner.But now the people use social media is becoming more and more,a lot of impurity information which is redundant and irrelevant about news topics is confused with effective information.Therefore,it is very important to find information related to news topics and people are interested in the massive social media text.The main task of this paper is to establish the connection between news text and social media text,and improve the efficiency of people obtaining information.In order to achieve the purpose above,a method of the contextual cohesion of social media texts based on the Topical N-gram Model(TNG)model is adopted.Firstly,the TNG model is used to model the news text to obtain the subject information.Because the TNG model takes the influence of word order on the subject into account,n-gram phrases are added to the model,so the context information of the words is fully utilized.Because of the difference in the use of the news and the social media in terms of vocabulary,we use the word co-occurrence in subject and social media to adjust the subject,then calculating the similarity.The adjusted subject not only contains the words in the news,but also the words in the social media,so that the classification of the social media can be better achieved.In addition,we adopt another method of the contextual cohesion of social media texts based on Word Mover's Distance(WMD)text distance.This method takes the fact that the third chapter can't capture different but similar words or phrases in calculating the similarity into account.Then using a novel text distance calculation method based on word2vec to calculate the distance between the subject and social media text.Because word2vec exploit a large-scale external corpus to train in order to get the word embedding,so the word embedding is good in quality,and it only needs a simple vector operation can get its corresponding semantic information.Even if it does not have the same word on the surface,then the cost of a word "travel" to a similar word is certainly less than the other words.
Keywords/Search Tags:Topical N-gram Model, n-gram, word co-occurrence, word2vec, Word Mover's Distance
PDF Full Text Request
Related items