Font Size: a A A

Chinese Word Segment Based On Dictionary And Suffix Array

Posted on:2007-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:S M ZhangFull Text:PDF
GTID:2178360212458814Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese word segment is a question for discussion in the field of Chinese information processing, which is the fundamentals of machine translation, document indexing, intelligent searching and natural language understanding and treatment, also the key for Chinese text classification. The topic on this field attracts many computer and language experts. The experts and scholars are keeping efforts to have designed some successful Chinese word segment systems since early 1980's when the concept"Chinese word segment"was put forward. The methods used in theses systems can be classified into 3 groups: the method based on String Matching, the method based on Understanding and the method based on Statistics. Many mature word segment algorithms like Maximum Matching Method, Reverse Maximum Matching Method, Word and Word Travel Method ,and so on.Two puzzles on Chinese word segment processing solve 0f different meanings and distinguish from no logging in dictionary have not been completely solved yet. Comprehensive glossaries must been built up and the method for solve 0f different meanings must been given if solving problems on Chinese word segment. Word segment algorithms are now mostly based on dictionary. The advantage of word segment based on dictionary is high recall and precision but it treats information slowly and can not treat fresh words and proper words and can also produce solve 0f different meanings.Suffix array was initially introduced as a text indexing structure. This data structure records the lexicographic ordering index of every suffix in a string of 4n bytes(4 bytes per input character), through which typical string processing problems can be effectively resolved. The early work on suffix array was motivated by biological application such as matching of DNA sequences. The data structure was then applied in string matching, text...
Keywords/Search Tags:Dictionary
PDF Full Text Request
Related items