Font Size: a A A

Study On Global Evaluation For Text Segmentation

Posted on:2006-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:X Z ChangFull Text:PDF
GTID:2168360155458176Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
This thesis introduces the history, development, and some popular methods of the text segmentation technique. Comparing with text categorization, this paper lists the problems in text segmentation and introduces their solutions. Since statistic pattern recognition methods have been employed in Natural Language Processing in recent years, the Fisher Discriminate provides a framework for text feature selection. Inner and Outer Distance is introduced to support the methods in this paper. The combination of cluster analysis and Fisher Discriminate helps us form the evaluation function of the text segmentation.Since the Inner and Outer Distance is deduced from the text categorization methods, some formulation will be corrected in text segmentation. Especially in bias segmentation situation, this paper try to correct the error by Entropy function. At last, we use product probability as a punishment function in text segmentation.The significance in this paper is employing one of functions, which have global evaluation. Not only in segmentation but also in paragraph number estimating are in need of global information. Then, we can get a conclusion that the global information is necessary for text segmentation.To compare with the results of the TextTiling, we draw all result curves with the TextTiling curve together in figures. Especially after the adjusting of the probability product function, the P_k value of this paper method gets a substantial promotion. At the same time, the accurateness the paragraph number also gets a promotion.Finally, this thesis summarizes all work, and provides some improvement plans of text segmentation. The methods in this thesis will be developed with these improvements.
Keywords/Search Tags:Text segmentation, Fisher liner discriminant, Inner and outer distance, Clustering analysis
PDF Full Text Request
Related items