Font Size: a A A

Research On A Language Model Based On Dependency Relationship

Posted on:2014-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y K XuFull Text:PDF
GTID:2248330392961033Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Language input, an important field of natural language processing, whichhas great significance on research of human-computer interaction,information filter and retrieval, Text error correction, etc. This paper focuseson techniques of modern Chinese input optimization including relatedstatistical language models, syntactic analysis and semantic analysis, andproposes a new language model based on dependency relationship.On the basis of traditional N-gram model, considering the importance ofsyntactic and semantic information of texts, the following contributions aremade:Firstly, adjust the structure of dependency tree, and define visible wordsbased on dependency tree. Since standard dependency tree only showsdependency relationship between word pairs, but ignores the sequence ofwords in the original sentence, sequence information is added by distinguishleft children and right children of nodes, which is the basis of visible words.Visible-Word helps language decoding process take advantage of structure of texts.Secondly, give dependency relation direction information, and providealgorithm to compute dependency probability. With direction information anddependency probability, words’ relationship can be better described andapplied in the process of language decoding.Thirdly, define similar words and smooth dependency probability. Similarwords, i.e. words with similar dependency relationship can be used to smoothdependency probability and reduce the impact of problems caused by datasparseIn the last part, combine the new language model with N-gram andverify the performance of the combination model with Pinyin input method.The data of experiments show that the combination model improves sentenceabsolute accuracy by15.72%and character absolute accuracy by2.8%,respectively.
Keywords/Search Tags:Dependency relationship, language model, Pinyin input
PDF Full Text Request
Related items