Font Size: a A A

Distributed Representation And Semantic Composition In Chinese Language Processing

Posted on:2017-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:X WenFull Text:PDF
GTID:2348330491964014Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, Distributed representations of words and models of semantic composition have led to the breakthrough in natural language processing. Representation learning can represent words using dense real-value vector and capture both syntactic and semantic similarities be-tween words, while models of semantic composition can capture how the semantic is composite between or inside the words. In English, these models have reached the state-of-the-art in lan-guage models, part-of-speech and text categorization. However, due to the limitation of corpus resources and differences between languages, these methods haven't outperformed other models in corresponding tasks in Chinese.Based on the researches and results on distributed representation and semantic composi-tion in Chinese, we proposed a large corpus for training the representation of words, and mod-eled the semantic composition of these representations. Here are the major works:(1) We constructed a large Chinese news corpus to overcome the limitation of corpus re-sources on Chinese languages. The corpus is 25-9000 times larger than existing corpora. Word representations trained in this corpus reached the state-of-the-art in Chinese word analogy tasks.(2) We proposed a word split model and a semantic composition model to overcome the im-precise representation of infrequent and out-of-vocabulary words using distributed repre-sentations. The word split model can build the inner presentation of words using statistics from our corpus, while the semantic composition model capture how the semantics inside a word is combined. Experiment proved that the semantic composition model can build good representations for infrequent and out-of-vocabulary words.(3) We proposed a convolutional neural network to model the semantic composition between words, and benchmark in several text categorization and sentiment analysis datasets. Ex-periments showed the convolutional neural network can model semantic composition between words well and the word representations trained in this paper can works well in applied natural language process.
Keywords/Search Tags:natural language processing, representation learning, semantic composition, con- volutional neural network
PDF Full Text Request
Related items