Word Activation Force Model Based Chinese Word Detection

Posted on:2014-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Zhang

Full Text:PDF

GTID:2248330398472080

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Word segmentation is a key process in Chinese information automatic processing. However, the common used word segmentation method, string matching, is highly depended on the completeness and correctness of word dictionary. Today, in the Internet age, many new words are created and used, as while as many old words are dropped. The traditional manual way of maintaining word dictionary, is not able to catch up with the high speed of word dictionary updating in Internet age. More automatic, more computable way of word detection is needed to meet the strong demand of Chinese information processing.A Word Activation Force (or WAF) based word detection method is used in this article, to explore how Chinese word is made up, by analyzing big text data in a special statistical way. WAF is a statistical model to analyze the activation effect in text data. It can well model the relationship between characters, words or entities. In this article, text is assumed to be character sequences, and characters are connected by activation relationships. Based on this assumption, a WAF model is built to analyze how word is made up from characters. In this article, research status of this field is first mentioned. The WAF model is introduced then. Then the WAF algorithm procedures are designed and implemented, including the big data processing method. After that is the word detection rule experiment, with a conclusion of statistical rule of word making up. Finally, a conclusion of the whole article is made, together with future works.

Keywords/Search Tags:

activation effect, sparse matrix, big data processing, wordactivation force, word detection, word segmentation

PDF Full Text Request

Related items

1	Research On Long Distance Language Mode Based On Word Activation Force
2	Research On Application Of Waf In Text Processing
3	Research On Speech Recognition Based On Word Activation Force
4	Research On New Word Detection From Microblog Data
5	Research On Event Detection Algorithm For Microblog
6	Research And Implementation Of Chinese Word Segmentation Algorithm
7	Research Of Problems In Spoken Term Detection
8	A Study Of Key Problems In Spoken Term Detection
9	Research On Chinese Word Segmentation Integrating Pinyin And Tone Information
10	Research On Chinese Word Segmentation Based On Text And Audio