Font Size: a A A

Research On Top N Hot Topics Detection Method Based On Key Features Clustering

Posted on:2016-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:R Q ZhangFull Text:PDF
GTID:2308330476455001Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Today new Internet technologies, such as Cloud Computing,Mobile Internet, Social Media and Big Data, have brought great opportunities and challenges to all kinds of business. At the same time, data is increasing with an incredible speed so that tranditional information process methods are already stretched in face of Big Data.People need to find out important information among the overwhelming information. Especially they want to focus on hot topics. Hot topics detection is a technology aiming to help people find, organize and manage useful information that they concern. However, tranditional hot topics detection technologies are mainly used in topics detection on news and reports and it still needs imporved to fit requirement of Social media and Big Data.We introduce a Top N hot topics detection method based on key features clustering. This method is developed based on tranditional topic detection methods. The basic principle of this method is that key features can be a good representative of one document and it can be used to find hot topics by clustering.Top N hot topics detection by key features clustering method mainly contains two stages: automatical keywords extraction on single document and Top N hot topics detection on key features.On stage 1, documents will be preprocessed to generate key feature candidates. It is followed that key features candidates will be scored by three kinds of keywords extraction method. High scored candidates will be selected as key features. Here we will introduce an optimized Text Rank algorithm to improve the performance of key features extraction.On stage 2, duplicated key features will be removed and others will be mapped to topic space to generate initial topics. Subtopics can be find by clustering initial topics. In order to clear wrong subtopics and merge similar subtopics, subtopic key features will be extracted. We introduce coverage of subtopic key features to clear wrong subtopics and merge similar subtopics. By analyzing the coverage of key features on each subtopic, those will be easy to complete. After that, Top N hot topics will be detected.
Keywords/Search Tags:Topic Detection, Key Features, keywords Extraction, Subtopics, Top N hot topics
PDF Full Text Request
Related items