Font Size: a A A

Research On Comparative Summary For Multi-document Sets

Posted on:2013-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:J YeFull Text:PDF
GTID:2298330422474300Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of e-commerce and search engines, more and morecomputer applications need to dig out main information from a text collection, for userscan quickly tour and use.Comparative Text Mining (CTM) is a hot issue in the field of text mining researchrecent years, Users usually need to analyze the crowd differences in the different time,place, culture, thus we need analysis a series of text set. For example, researchers mayneed to know the trend of a hot research field in a few years, the government personnelneed to understand a events’s public opinion development over time. These need carryout a comparative analysis of data sets and dig out the knowledge of user needs.Research on CTM has just begun and CTM’s main goal is to dig out the commonconcern of all text sets as well as unique information for each text sets. The existingCTM model are CCmix and CCLDA model, but they can only apply to the text setswith high similarity, but not apply to the text sets with important specific information.In this paper, we proposed extended model ECCmix based on these two models. Themodel can be achieved the topic comparization of various similarity text sets. We usethe topic evolution experiments of the news events and the geographical and culturalcomparative analysis experiment to evaluate the effect of the model, the results showthat the model is effective.Because the topic model run results based on word frequency display, it has poorreadability and it is inconvenient for users to understand. We proposed a comparativesummary algorithm oriented multiple text sets based on ECCmix topic model, this wayhelp the users quickly analysis and compare text sets. The main idea of the algorithm isto design a scoring algorithm of sentence importanance based on parameters of topicmodel running, and by a certain strategy to extract sentences to form a summary. Theresult of the comparison for multi text sets displayed in summary form is more intuitive,user-friendly reading and comprehension.Finally, we do the summary experiment according the proposed summaryalgorithm. We collect multiple data sets for experiments and analysis the results ofsummary according machine and manual summary. We evaluate the summary resultaccording manual scoring methods and the method based on recall and precision rate.Experimental results show that this summary system is effective.
Keywords/Search Tags:comparative summary, topic model, ECCmix model, sentencescoring, topics importance ordering
PDF Full Text Request
Related items