Font Size: a A A

A Method Of Tracing The Topic Of Microblogs Based On Random Forest

Posted on:2018-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:X J TangFull Text:PDF
GTID:2348330518953951Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Facing the large amounts of information on the Internet,if only through Tencent news,today's headlines to search for relevant information,we will waste a lot of time in some of the information that is not relevant to you.If you use the classification algorithm technology that can help you quickly find the topic information that you want to focus.so you will not waste a lot of time and you can efficiently deal with other things.In the process of widely used with the classification algorithm and Topic Tracking has gradually become a research category for researchers.In this article we use the random forest classification algorithm.After the improvement of the algorithm and processing,it will have the adaptive function.In the above introduction,the process of tracking included the following four aspects.Firstly,based on the random forest classification algorithm,four processes are used to track the topic.(1)Grab data.(2)Use the captured data to select representative words.(3)The representative words are classified and screened to construct a model.(4)With the passage of time,the topic will change.At this time,the third step of the model needs to be adaptive evolution.The focus of this paper is the selection of representative words and the analysis of the evolution of microblog topic.Secondly,in the selection of representative words,we should introduce a method of how to select the representative characteristic words,which is divided into three steps:The segmentation of Chinese word,the selection of representation words and the weight calculation for representative words in the whole micro-blog topic.In the process of the function of excavating new words is added to make Chinese word segmentation more accurate.The standard of selecting the representative words with similarities and differences is mainly according to the corresponding feature topics to select the representative words.Afterwards,the global weight of representative words can be calculated according to the standard.In the weight calculation,the weight of the formula is used to improve the algorithm-Okapi.The topic will be changed as time goes by,so the corresponding topic classification results also will be changed.The standard model of classification can be updated by the feedback response.At this point,the LDA method will be used to get new topics and summary,and to determine the change of the topic.Finally,based on random forest microblog topic adaptive tracking method is applied to automatically track the hot microblog topics.Then,we can also summarize the change of microblog topic,and track the related microblog information continuously and automatically.
Keywords/Search Tags:topic tracking, random forest, representative words selection, topic evolvement
PDF Full Text Request
Related items