Font Size: a A A

The Typical Weibo Accounts In The Area Of Education Extracted From The Crowd Based On The Topic Model

Posted on:2018-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:P YuFull Text:PDF
GTID:2348330518477361Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development and popularization of the Internet is affecting the development of the society and the dissemination of information.More and more people are accustomed to disseminating knowledge,events,policies and informations by micro-blog,forums,communities and other network platforms.The field of education is also updating and iterating quickly in the new era,the development of information platforms provide us a shortcut to access to educational information.At the same time,the problem of information redundancy comes.Therefore,in the fast-paced life,we want to capture the forefront information of education as quickly and comprehensively as possible.The research object of this paper is bloggers' words about education released on the micro-blog platform.We hope to find a way to help us pick out a small set of bloggers among the mass.Then we can just focus on what they issue,that is exactly the latest and comprehensive dynamic educational information.First,this paper analyzes the existing similar problems and methods,and then it focuses on the effective topic model.With consideration of education and the characteristics of micro-blog text,it sums up the standard of picking out the preliminary set,then we can get a suitable sample,that is their texts.And then the paper deals with the data conversion and processing,form thesaurus and numbering the words,so as to form a list whose format is "blogger-serial number-frequency words",and then the data can be directly applied to the model analysis.In the process of analysis and finding the solution,there are three experiments on the data.First we extract a small sample and get the results with two kinds of ways,one way is Author Topic Model and the other is manual understanding.We found they are very similar,which can prove the rationality of the model.The method to filter bloggers comes out,that is,using the AT model to divide text themes,then combining with the key words of each theme and sort the themes by the coverage ratio.According to the order of the themes,the bloggers will have priority if they turn up at the most times.At last,the paper analyzes the two samples collected by different scales and standards,and finds out the optimal set of attention objects.The research provides a good example for dealing with similar problems.More work can be further explored,that is,the updating weibo or the blogger background.This requires ongoing research,the results of this study can lay the foundation for the follow-up study.
Keywords/Search Tags:Information redundancy, object filtering, Topic model, text mining
PDF Full Text Request
Related items