Font Size: a A A

Research On News Hot Extractioon Method Based On Semantic Analysis And Improved Of K-means Algorithm

Posted on:2015-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:C G XuFull Text:PDF
GTID:2298330467481202Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and information technology, the information content is also growing rapidly in the web. Facing the rapidly growing amount of information, how to extract key information quickly have become a hot research scholars. In one hand, most Web content are the natural language text, so it’s difficult to extract and analysis directly. In the other hand, the hotspot formation that people focus on has great randomness in the period of time. Hence, the critical information of news is often difficult to define and classification precisely.In this paper, we aiming at the key problem of network content information extraction. Select the college news as experimental subjects, combining the semantic analysis and improvement of K-means algorithm, applied this algorithm in news topic extraction and analysis, designing a hot news analysis platform based on the above algorithm. The main contents of this paper are the following aspects:(1) Semantic analysis method is applied to word disambiguation and synonym combination in text pre-processing, improving the accuracy of the text pretreatment.(2) The improved K-means algorithm based on density is applied to topic collection filter and construction of a candidate topic set in the topic detection process. The experiment verified the effectiveness of screening and extract the topic combined with similarity calculation, and proves that the topic detection algorithm for this method is better than the traditional.(3) Designed and implemented with a certain versatility of hot news analysis platform based on the proposed algorithm, and targeted on university news are collected and analyzed.This study proposes an internet hotspots extraction algorithm based on semantic analysis and K-means algorithm, design and implemented the hot news and analysis platform to test the algorithm, and test results show that the method can achieve expected effect analysis of network content better.
Keywords/Search Tags:Semantic Analysis, text clustering, topic detection, topicextraction, hot news analysis platform
PDF Full Text Request
Related items