Font Size: a A A

An exploration of the word2vec algorithm: Creating a vector representation of a language vocabulary that encodes meaning and usage patterns in the vector space structur

Posted on:2017-05-29Degree:M.AType:Thesis
University:University of North TexasCandidate:Le, Thu AnhFull Text:PDF
GTID:2468390011989998Subject:Mathematics
Abstract/Summary:
This thesis is an exploration and exposition of a highly efficient shallow neural network algorithm called word2vec, which was developed by T. Mikolov et al. in order to create vector representations of a language vocabulary such that information about the meaning and usage of the vocabulary words is encoded in the vector space structure. Chapter 1 introduces natural language processing, vector representations of language vocabularies, and the word2vec algorithm. Chapter 2 reviews the basic mathematical theory of deterministic convex optimization. Chapter 3 provides background on some concepts from computer science that are used in the word2vec algorithm: Huffman trees, neural networks, and binary cross-entropy. Chapter 4 provides a detailed discussion of the word2vec algorithm itself and includes a discussion of continuous bag of words, skip-gram, hierarchical softmax, and negative sampling. Finally, Chapter 5 explores some applications of vector representations: word categorization, analogy completion, and language translation assistance.
Keywords/Search Tags:Vector, Word2vec, Algorithm, Language, Chapter, Vocabulary
Related items