Font Size: a A A

Research On Improved Algorithm Of Topic Detection And Tracking

Posted on:2014-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:X C HouFull Text:PDF
GTID:2268330422463530Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the Internet information shows explosivegrowth.Effective organization and management of these information has become moredifficult and it often appears the phenomenon of information overload.For effectivelyorganizing and managing these information,topic detection and tracking technology cameinto being.For a variety of news report of information flow,detecting new topics andtracking the follow-up reports of known topics are the main purpose.According to the characteristics of the topic detection, hierarchical clustering is usedto detect topics where it doesn’t set the number of categories when clustering. Hierarchicalclustering can well adapt to needs of topic detection, and on this basis named entity hashigh-sensitive characteristics in the topic report. Increasing the weight value of the namedentity in the calculation can improve overall system performance during similaritycalculation. The experiments in the existing corpus and experimental data show that theimprovement on the similarity calculation improve the rate of correct of topic detectionand reduce system spending.When commonly used traditional K-nearest neighbor algorithm is applied to track thetopics,it requires balance between the reported number of topic and this shortcoming willcause the topic to offset to a certain extent. Support vector machine algorithm is used inthe training phase of K-nearest neighbor algorithm to determine support vector to replacethe value of K and it can eliminate the dependence of value of K. It decreases topic offsetproblems because of uneven between the reported number of topic. Experimental resultsshow that to a certain extent these methods improve the performance of topic tracking andit tests and verifies that the correct rate of improved K-nearest neighbor algorithm isn’taffected by parameter K.
Keywords/Search Tags:topic detection, topic tracking, hierarchical clustering, similarity calculation, improved K-nearest neighbor
PDF Full Text Request
Related items