Font Size: a A A

Natural Language Processing-A Study Of Vectorization Of Chinese Words And Short Texts

Posted on:2020-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:P PengFull Text:PDF
GTID:2428330578452017Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the rapid development of computational science,especially the computational power of computers has greatly increased,and the application of machine learning in Natural Language Processing(NLP)has become more and more extensive.Natural Language Processing has also been greatly developed under such environment.In the field of Natural Language Processing,converting words into recognizable languages for computer is a fundamental research,so the study of vectorization of word and text is particularly important.Traditional analysis of text,data is usually based on term fre.quency(TF)and term fre.que.ncy-inve.rse document frequency(TF-IDF)as the representation of text.Such methods often only reflect part of the text information,ignoring the intrinsic semantic features of the text.Especially for short text data mining,the frequency of occurrence of keywords is usually low,which poses a huge challenge to the statistical model.Therefore,this paper proposes a Probabilistic Language Model(PLM)for the vectorization of Chinese words.The basic idea is to model and analyze according t,o the order of the words in the text.The proposed model can be used for quantitative analysis of text semantics in short text data mining.We mainly solve two kinds of problems:First,how to conve.rt Chinese words into ve.ctors reasonably,which ensures the similarity of Chinese synonyms in digital spatial features;Second,how to establish appropriate vector space for Chinese text where information is kept.Finally,the rationality of Chinese word vectorization method is verified by Jin Yong's novel characters.On the other hand,based on the actual message text data from the message board of a city's housing management department,we use the BP Neural Net and the Recurrent Neural Net(RNN)algorithm to solve the Probabilistic Language Model.Compared with the traditional text processing methods,the model of this paper has certain advantages for the short text data mining.
Keywords/Search Tags:Natural Language Processing, Probabilistic Language Modeling, Word to Vector, Text to Vector
PDF Full Text Request
Related items