Font Size: a A A

The Research Of Chinese Automatic Segmentation Method Based On HowNet Semantic Relevancy Computing

Posted on:2007-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:G Z WangFull Text:PDF
GTID:2178360185980667Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese Automatic word segmentation methods have been the important research contents of Chinese information processing and even artificial intelligence field all the time. Among them disambiguation as the most important and also the most difficult two fields, have already made great progress at home and abroad, and has put forward a lot of effective segmentation methods. But still there is great disparity in the precision in disambiguation, and has met much difficulty.We have done in-depth research on the technique of Chinese segmentation, method and the implementation techniques of segmenting system, and proposed a segmenting model based on Hownet semantic relevancy computing, and dealt with ambiguities of every chain length. According to the research above, we designed and realized WGZ segmenting system, and tested it with some examples. We have compared our semantic relevancy computing method with other methods. The experiment approved that the segmenting precision using our method is 97. 1% if taking sentence as unit, and it is 99.4% if taking word as unit, and the precision of tagging is 91.4%.Under our study we find that, disambiguation is not only the question of the word, but relates to its context, including adjacent word, sentence, paragraph even chapter, has also direct or indirect relations. We have fully considered these factors in designing and realizing the system, and proposed the model of semantic relevancy computing based on Hownet of words, and dealt with the ambiguity to every chain length. In Chinese corpus tagging, we have analyzed forefathers' rule based work, and have proposed the method based on rule PRI, finished the work of word segmenting and Chinese corpus tagging finally.While designing the word segmentation system, we have fully study the knowledge structure of Hownet and Knowledge Database Mark up Language, and have understood the superiority of Hownet delineating concept knowledge of words. We segmented input text。 utilizing the methods of Maximum Matching and Reverse Maximum Matching, and found ambiguous word through two-way-scan method. In the important step of disambiguating, we disambiguated the ambiguity through computing semantic relevancy between two words. We have put forward the solution in solving ambiguous word to every chain length, or we can do it through transforming it to chain length of 1. In the step of Chinese corpus tagging, we adopt perfecting and adjusting rules while tagging, until the preciseness of tagging crude corpus utilizing the rule-base achieves our needs.While testing the system, we have compared semantic relevancy computing method that...
Keywords/Search Tags:Chinese Automatic Word Segmentation, semantic relevancy, part of speech tagging, Hownet
PDF Full Text Request
Related items