Font Size: a A A

A Feature Space Optimized Algorithm Based On Word Embeddings For Synonym Expansion

Posted on:2016-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:W T ZhangFull Text:PDF
GTID:2298330467492103Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet and social network, text information on the web is also growing exponentially. People’s communication in social network leads to the meaning changing and being enriched of words. With the phenomena of fresh words, meaning changes of words and non-standard use of language, the word correlation calculation, especially synonym expansion plays a more and more important role in information retrieval, natural language processing, text mining and other fields.In the field of information retrieval and natural language processing, synonym expansion has been a basic and critical tasks, a variety of methods including clustering, apriori and topic model has a good effect on synonym expansion. In addition to methods based on statistical and rule-based approach, artificial thesaurus synonym expansion is also an important research results. With deep learning methods obtaining incredible results in image processing, speech processing and other direction recently, most researchers are working on deep learning in natural language processing.Based on deep learning and word embeddings research in natural language processing, this paper focuses on innovative research and application on synonym expansion. Considering of big data age and features of synonymous words, the main problem of this paper is to study how to expand words with the same semantics and syntax information as given seed words from the mass of words in the text.Aiming at this problem, this paper mainly completes the following tasks: Firstly, research and experiments on word similarity computing, including one-hot representation, distributional representation, WAF based representation and word embeddings.Secondly, learning and research of neural network language model, combining with deep learning models in natural language processing, and training word embeddings. This paper mainly used C&W neural network model and word2vec tools to conduct training and word similarity computing experiments.Thirdly, learning common semantics of seed words based on word embeddings, completing feature space transformation and thereby expanding words having the same meaning with seed words. This paper combined word embeddings and state-of-the-art word tagging methods such as POS, NER, Parser, etc. Negative sampling will also enhance accuracy and robustness of our model. Comparison between our model with word2vec and WordNet synonym dictionary shows that the algorithm performs better in synonym expansion.Fourthly, utlizing the model in micro-Bo short text classification, TREC KBA reviews and other projects. Our method can effectively extend the feature space, increase features, and improve classification accuracy. In Information Retrieval, we can also use this method in query expansion, which effectively improve precision and recall.
Keywords/Search Tags:synonym expansion, word embedding, neural network, language model, negative sampling, deep learning, word2vec
PDF Full Text Request
Related items