Font Size: a A A

Web News Gathering Based On Hierarchical Topic Model

Posted on:2016-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:J F BaiFull Text:PDF
GTID:2308330470967698Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology, the Internet has become one of the most significant approach of getting access to information for people. Especially the popularization of Mobile Internet, people can share every kind of information on the Internet whenever and wherever, so that it makes the amount of data on the web expanding at an extremely fast speed. Hence, it is too difficult to acquire domain-related knowledge as quickly and accurately as possible. So, how to efficiently gather web news has become a very important research aspect in the age of big data today.In view of the need of gathering domain-related data given above, this paper integrates web crawler, text categorization and topic modeling technology and proposes a framework for gathering web news based on hierarchical topic model. On the basis of studying the multi-sources web crawlers and parallel text categorization technology, this paper mainly discusses the topic keywords screening technology. Also, this paper implement a public security oriented web news gathering system based on the frame work. In general, the work of this paper covers the following three parts:Firstly, this framework integrates multiple web crawler to gather web news from multi-sources. In order to efficiently manage web news, it also design a parallel web-news classifier and a method for labeling topic keywords based on hierarchical topic model.Secondly, according to the defects of artificial approach to screen topic keywords, a keywords selection method based on online hierarchical dirichlet processes is proposed. The experimental results show that our work properly solves the problem of automated screening of keywords for web searching and is consistent with the artificial approach.In the end, on the basis of the framework proposed above, this paper has implemented a public security oriented web news gathering system.
Keywords/Search Tags:Web Crawler, Text Classification, Topic Keywords, Public Security
PDF Full Text Request
Related items