Font Size: a A A

Research On Unknown Words Recognition And Word Meaning Discovery Based On Short Text Of Micro-blog

Posted on:2019-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y L JiaFull Text:PDF
GTID:2428330593450481Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Unknown word recognition technology is of great significance to improve the accuracy of text segmentation and syntax analysis.With the continuous development of the network social platform,the WEIBO(WEIBO,MicroBlog)platform has become an important platform for people to share,spread and obtain information.The research on the recognition of unknown words based on the short text of micro-blog has become a hot spot,while the micro-blog text contains a large number of nonstandard terms and network buzzwords,which has increased the difficulty of recognition of unknown words.Aiming at the characteristics of micro-blog short text,this paper proposes an algorithm for recognition of unknown words and semantic discovery based on micro-blog short text.This paper proposes an improved FP-Growth algorithm for unknown words recognition(POS-FP),which considers the influence of part of speech on the recognition of unknown words.First,the POS-FP algorithm is used to obtain frequent itemsets,and then the unknown words are obtained by combining with the N-grams model,and then the improved mutual information,the left and right information entropy,the context dependence and the open source verification are used to filter and verify the initial unknown words.Compared with traditional methods,this algorithm improves the recognition rate of unknown words in micro-blog short texts.A method of word sense finding based on similarity computation is proposed in this paper.First,we build a synonym forest with part of speech(POS-Cilin)based on micro-blog,and then use Word2 vec technology to generate the words Vector of unknown words and all nouns,and make use of the constructed POS-Cilin to modify the word vector.Finally,we get the word meaning set of unknown words to express the meaning of unknown words by similarity computation,and verify the effectiveness of the method through experiments.
Keywords/Search Tags:Uknown Word Recognition, FP-Growth Algorithm, Word2vec, POS-Dic-Cilin, Similarity Measure
PDF Full Text Request
Related items