Font Size: a A A

Research And Development Of Phrase-Representation Summarization Method For Chinese

Posted on:2008-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:B HuFull Text:PDF
GTID:2178360212976038Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Summaries are often used to support fast and accurate judgment when selecting relevant information from information retrieval (IR) results. A kind of"at-a-glance"summary that emphasizes brevity (short in length) and simplicity (less embedded sentences) for such an"indicative"purpose and developed phrase-representation summarization method instead of the important sentence selection adopted by many summarization systems.Based on this idea, Fuji Xerox has developed a Japanese Phrase-Representation Summarization system and applied it into practice. We assume such phrase-representation summarization method will be also required for Chinese IR systems and thus we decided to develop this. Because the previous method has been designed mainly for Japanese and the Chinese linguistic characteristics is rather different, we must reconsider from the linguistic formalism. Linguistic formalism change requires the modification of phrase construction algorithm. In this thesis, we make a brief review of the phrase representation summarization algorithm for Japanese. Then we discuss the strategy of modifications to apply it to Chinese.We have developed the phrase-representation summarization algorithm for Chinese based on the Japanese version. While the algorithm for Japanese creates phrases from syntactic sub-tree constructed by selecting a core relation and attaching required relations, the method developed for Chinese applies an appropriate pattern to the important predicate-argument structure selected from all the analysis results of input sentences. LFG (Lexical-Functional Grammar) is used as the analysis module.To evaluate the performance of the phrase-represented summarization, we carried out a task based evaluation experiment in information retrieval tasks. We have designed tasks similar to the actual WWW retrieval scene, used fine scale to judge the relevance of the documents, and introduced new measure representing the accuracy of sifting. This method can evaluate the performance more accurately than the methods used in the related works. The result of the experiment shows that the phrase-represented summarization can sift documents more accurately than existing...
Keywords/Search Tags:IR, at-a-glance, Phrase-Representation Summarization, LFG predicate-argument structure
PDF Full Text Request
Related items