The Research Of Chinese Automatic Segmentation Method Based On HowNet Semantic Relevancy Computing

Posted on:2007-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:G Z Wang

Full Text:PDF

GTID:2178360185980667

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Chinese Automatic word segmentation methods have been the important research contents of Chinese information processing and even artificial intelligence field all the time. Among them disambiguation as the most important and also the most difficult two fields, have already made great progress at home and abroad, and has put forward a lot of effective segmentation methods. But still there is great disparity in the precision in disambiguation, and has met much difficulty.We have done in-depth research on the technique of Chinese segmentation, method and the implementation techniques of segmenting system, and proposed a segmenting model based on Hownet semantic relevancy computing, and dealt with ambiguities of every chain length. According to the research above, we designed and realized WGZ segmenting system, and tested it with some examples. We have compared our semantic relevancy computing method with other methods. The experiment approved that the segmenting precision using our method is 97. 1% if taking sentence as unit, and it is 99.4% if taking word as unit, and the precision of tagging is 91.4%.Under our study we find that, disambiguation is not only the question of the word, but relates to its context, including adjacent word, sentence, paragraph even chapter, has also direct or indirect relations. We have fully considered these factors in designing and realizing the system, and proposed the model of semantic relevancy computing based on Hownet of words, and dealt with the ambiguity to every chain length. In Chinese corpus tagging, we have analyzed forefathers' rule based work, and have proposed the method based on rule PRI, finished the work of word segmenting and Chinese corpus tagging finally.While designing the word segmentation system, we have fully study the knowledge structure of Hownet and Knowledge Database Mark up Language, and have understood the superiority of Hownet delineating concept knowledge of words. We segmented input textã€‚ utilizing the methods of Maximum Matching and Reverse Maximum Matching, and found ambiguous word through two-way-scan method. In the important step of disambiguating, we disambiguated the ambiguity through computing semantic relevancy between two words. We have put forward the solution in solving ambiguous word to every chain length, or we can do it through transforming it to chain length of 1. In the step of Chinese corpus tagging, we adopt perfecting and adjusting rules while tagging, until the preciseness of tagging crude corpus utilizing the rule-base achieves our needs.While testing the system, we have compared semantic relevancy computing method that...

Keywords/Search Tags:

Chinese Automatic Word Segmentation, semantic relevancy, part of speech tagging, Hownet

PDF Full Text Request

Related items

1	Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging
2	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
3	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
4	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
5	BiLSTM And CNN Based Joint Model For Chinese Word Segmentation And Part-of-speech Tagging
6	Word Segmentation And Pos Tagging In Chinese
7	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
8	Chinese Word Found Its Part Of Speech Tagging
9	Research Of Chinese Word Sense Disambiguation Based On Hownet
10	Research On Laodian Participle And Part-of-speech Tagging Method