Font Size: a A A

Research On Automatic Text Summarization Based On TongYiCi CiLin

Posted on:2008-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2178360245491973Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The Internet developed faster and faster in 1990s, the explosive increase in information satisfied people's requirements, however, it's more difficult for people to quickly and precisely find what they really need. Automatic Text summarization became a hot topic which attracts much attention because of an effective solution to the problem.The dissertation introduced the historical development and research status of Automatic text summarization, then narrated the main models and methods, such as statistics based, meaning based, concept based etc. And the dissertation discussed their merits and demerits and concluded respective characteristics.The dissertation narrated the technology of the NLP(Natural Language Processing)and development status of the corpus, what's more, referred the famous corpus <>labeled by the Institute of Computational Linguistics of Peking University, the dictionary was established and the Transition probability table of POS was counted. In addition, the dissertation introduced the segment method of the word, such as forward and backward as well as bidirectional method, at the same time, the method of disambiguation based on Mutual Information was applied for the overlapping ambiguity string, meanwhile, the ambiguity information table was established in order to avoid searching in the whole corpus, which can improved the efficiency; The dissertation also introduced the Markov model which worked well on the POS tagging with the help of the characteristics of phase transition .The dissertation extracted the conceptions of the sentences based on the TongYiCi CiLin and constructed vector space of text conception, and calculated the parameter, namely passage importance, with similarity theory, consequently the importance of each sentence was obtained. Finally, the classified method which was called fisher was used to extracted the highly importance in order to create the summary.The dissertation attempted to describe the characteristic of the sentence with the sentence agent who was extracted based on the syntactic dependency, and explores the method based on Rough Set.
Keywords/Search Tags:Mutual Information, Disambiguation, TongYiCi CiLin, Vector Space Model, Rough Set
PDF Full Text Request
Related items