| With huge Internet information resources,when users search for relevant hot news,a lot of duplicate reports describe the same event or haphazardly appears in the results.So that it is difficult for the user to view,analyze,summarize,and make a decision.So the use of topic detection technology is needed to improve on this issue.The pretreatment method of the news corpus has been studied in this dissertation,and the description based on cross-entropy method was on the basis of selected features(noun,verb).A variety of clustering algorithms used in topic detection was discussed.Then density-clustering algorithm based on data field theory was tried,finally the single-pass incremental clustering algorithm was used with the average-link strategy.To deal with all the stories in continuously time slices,first stories was clustered into candidate topics then these topics was merged or formed new topics so that to achieve improvements to the traditional single-pass incremental algorithm.On the basis of the algorithm,aging theory was used for quantizing the life cycle of all the topics.Experiments have achieved good results and finished the target of the topic detection.At last,evaluation method of hot topics has been studied,by using the value of Life Support and duration of every topic.By sorting the heat of topics,hot topics can quickly be found during the current time period.In this dissertation,the experimental corpus is news feature dataset publicly released in November 2015 in the channel of Sina News.Using based on cross-entropy method for reports feature extraction,density-clustering algorithm based on data field and incremental clustering algorithm based on the theory of aging theory on the topic found in the application.In order to evaluate the feasibility and effectiveness of the algorithm,precision,detection cost and other clustering effect evaluation.,eventually illustrate the improvement of algorithm performance. |