Font Size: a A A

Word Activation Force Model Based Chinese Word Detection

Posted on:2014-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ZhangFull Text:PDF
GTID:2248330398472080Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Word segmentation is a key process in Chinese information automatic processing. However, the common used word segmentation method, string matching, is highly depended on the completeness and correctness of word dictionary. Today, in the Internet age, many new words are created and used, as while as many old words are dropped. The traditional manual way of maintaining word dictionary, is not able to catch up with the high speed of word dictionary updating in Internet age. More automatic, more computable way of word detection is needed to meet the strong demand of Chinese information processing.A Word Activation Force (or WAF) based word detection method is used in this article, to explore how Chinese word is made up, by analyzing big text data in a special statistical way. WAF is a statistical model to analyze the activation effect in text data. It can well model the relationship between characters, words or entities. In this article, text is assumed to be character sequences, and characters are connected by activation relationships. Based on this assumption, a WAF model is built to analyze how word is made up from characters. In this article, research status of this field is first mentioned. The WAF model is introduced then. Then the WAF algorithm procedures are designed and implemented, including the big data processing method. After that is the word detection rule experiment, with a conclusion of statistical rule of word making up. Finally, a conclusion of the whole article is made, together with future works.
Keywords/Search Tags:activation effect, sparse matrix, big data processing, wordactivation force, word detection, word segmentation
PDF Full Text Request
Related items