Font Size: a A A

Research On Algorithm Of Topic Tracking Based On The Vector Space Model Of The Lexical Chain’s Sememe

Posted on:2015-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2298330422480859Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Classic topic tracking algorithm is to use the vocabulary as the feature word and the wordfrequency as the weight to build the feature space vector of the topic after the text participle, then dothe same process to get the report feature space vector, ultimately use the similarity between the twospace vector as the index of topic tracking. Though this method simplify the text processing in vectorspace operation, greatly improved the computability and operability of natural language, just usevocabulary as a feature item in the vector space model and word frequency as the weight, not only ledto the lack of semantic and structure of the text information, also added the dimension of vector spacewhich has caused the similarity calculation complexity increases. Eventually when apply to the topictracking it has affected the efficiency and accuracy.This paper tries to use HowNet to constructlexical chain based on semantic similarity, again with the sememe of the lexical chain building thesemantic feature vector of the topic, and then apply to the topic tracking to improve the efficiency andaccuracy of it. Finally experiments show the method is effective.This paper first introduces the basic theory of the topic tracking,the technologies such asmodeling, weight calculation,which were involved in the topic tracking process, analyzes the existingshortcomings in the course of topic model representation, at the same time, this paper also give theinformation of the concept of lexical chain extraction algorithm, HowNet and the word similarityalgorithm based on it;Then aiming at the deficiency of existing topic model representation method,the paper built the semantic vector space model based on the lexical chain to construct the the modelof the report and topic.After that the paper has carried on the contrast experiments twice, first use thetraditional method and the improved method for the same test corpus to calculate the similarity forcomparing; Then apply the two methods in the topic tracking system, compare the non-response ratesfalse positives rates and the system loss cost to verify the effectiveness of the new method.The specific innovations are as follows: based on the sememe of the lexical chain which wasextracted through the HowNet, this article has raised a new method for the topics and reportsmodeling; and then it also raises the structure weight calculation method which was based on thelexical chain,s sememe.
Keywords/Search Tags:Vector space model, HowNet, Sememe, Lexical chain, Topic tracking
PDF Full Text Request
Related items