Font Size: a A A

Mining And Application Of Hot News Topics By Bucket-based Quadratic Clustering

Posted on:2014-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:S K WuFull Text:PDF
GTID:2268330425976083Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the amount of information on thenetwork contains risen dramatically. Internet news is one of the main ways for peopleto acquire the social information. However, faced with such a large and hugeredundant Internet news reports. Especially hot news topic constantly evolving topicover time, and the discussion topic continued to spread. Artificial obtain dynamictopic is become an impossible task. So, there should be a fast and accurate way oftopic detection to obtain the hot topic on the Internet.Based on ideological of divide granularity, a method of depositing document tobucket was proposed to reduce computation on topic detection. Using three layers’topic model with multi-centroid to records the evolve topic over time. Andbucket-based quadratic clustering method was used to cluster the news reportsdocument. Finally, clustering topics presented to the user by web site. This studymainly includes the following aspects:(1) Using a bucket-based pre-classification processing method, and it caneffectively divide the large data set into smaller data set size. Likely similar news maybe divided into the same bucket and only compute similarity of likely similar news.Thereby reduce computation and improve text clustering speed. Experiments showthat, in the case of clustering a large size of document, the clustering speed has beenimproved significantly.(2) Using a three layers topic model (topic, sub-topic, reports) with multi-centroidthat multiple centroid to represent a topic. As the news topic on the internet usuallyevolve as time progresses. Thereby, there forming a plurality of sub-topics whichderivative of topic. In the topic model, each sub-topic has its own centroid. And themodel can effectively record the evolution of topic.(3) Using topic detection algorithm combined bucket-based and quadraticclustering to detect topic. Use bucket-based clustering algorithm to cluster thenetwork news in a period of time. And quadratic cluster the region cluster result to the old-topic set. And among them, using document vector which after adjustment featureitem weights to detect the sub-topic in the topic. Finally, the system produces topicswhich can record the evolution.Based on the above research, we design and implement the Internet newsmonitoring system, which could provide the latest topics.
Keywords/Search Tags:Topic Detection, Topic Evolution, bucket-based pre-classification, Topic Model, Quadratic Cluster
PDF Full Text Request
Related items