Font Size: a A A

Research On Chinese Word Segmentation And Keywords Extraction

Posted on:2021-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y DingFull Text:PDF
GTID:2518306347493054Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the appearance and development of Microblog,Zhihu,CSDN and other online plat-forms,the mushrooming growth of exponential information has appeared on the Internet.It is difficult for people in the face of a large amount of information to quickly search for the in-formation they need and time-consuming problems increasingly appear.In this thesis,based on the transformation of Chinese words segmentation and keywords extraction problems into sequence labeling problems and the combination of word vector representation techniques,the theoretical analysis of these two problems is carried out,the benefits and weaknesses of the existing methods are compared and this thesis puts forward its own improvement and optimization methods.In Chinese words segmentation research technique,this thesis proposes a method to blend the BILSTM-CRF model with the Attention mechanism based on the existing work.The relevance between the inputs and outputs of the BILSTM model is calculated by the Attention mechanism,and the whole character of the text is obtained by highlighting the importance of a particular word to the whole text according to its importance.The experi-mental results of this thesis show that the issue of segmentation in Chinese Natural Language Processing can be solved by using the improved model as well as the training method in this thesis.It can be processed efficiently and improves the accuracy of the processing.A keywords extraction method that blends TextRank and Word2vec is presented.An overview of keywords extraction methods,introduces the based on traditional TextRank key-words extraction method,and analyzes its advantages and disadvantages.The Word2vec of recent years is then introduced,describing its basic principles.Finally,an improved key-words extraction method based on TextRank and Word2vec model is proposed,the similar-ity matrix obtained by word vector training,the initial weights of graph nodes are analyzed and optimized,synonyms are attributed in the text preprocessing step,and the quality of keywords extraction is improved.
Keywords/Search Tags:NLP, Keywords Extraction, Word2vec, LSTM, CRF
PDF Full Text Request
Related items