Font Size: a A A

Research On The Recognition Model Of Network Buzzwords Based On Compound Features

Posted on:2022-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:L W GuoFull Text:PDF
GTID:2518306554971619Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the continuous improvement of Internet technology and the rapid increase of the size of Internet users,the Internet has gradually become a carrier of daily social communication,opinions and emotional release for network users,which provides a broad space for the generation and development of network buzzwords.Word segmentation is one of the most common steps in the analysis of Chinese text data.However,new popular words on the Internet,due to the limited popularity time,are only small parts of the whole corpus samples.If not fully detected and included by researchers,the marking errors of the named entities can be caused.Therefore,the excavation and recognition of popular language has become a challenging problem in natural language processing.Therefore,the recognition of popular language has attracted the close attention of the academic community.Some research institutions and scholars have actively carried out research in the field of popular language recognition.Some researchers used artificial judgment analysis to carry out auxiliary research.Although artificial judgment is the most accurate way of identification,factors such as human consumption and recognition results affected by the perceptual understanding of people's own language are still restrictive factors for research in this field.The performance and recognition accuracy of purely statistical methods are also poor.In recent years,deep neural network has been used in deep learning to train language processing model method has become a hot research topic.Inspired by this,this paper applies the deep neural network technology to the discovery and recognition of network buzzwords.The research frontier structures such as BILSTM,CNN,CRF,attention and other structures are used to construct the network popular language identification model,and the ablation experiments of different network structures are carried out under various combinations.After optimizing and improving the network structure and activation function,excellent research results has been obtained.In this paper,we first preprocess the original microblog corpus,use N-gram model to extract the word strings,filter the word strings according to the statistical factors such as mutual information,left and right information entropy,word frequency,etc.And then use the candidate dictionary to filter candidate buzzwords and mark the original corpus of Weibo.In the deep neural network processing stage,Skip-graim is used to extract feature vectors.At the same time,through multi-dimensional analysis of character vectors,Jieba word segmentation feature vectors,pinyin feature vectors,and glyph feature vectors,such vectors are fused into a composite feature vector as the input vector of the model.Secondly,BI-directional Long Short Team Memory(BILSTM)structure and Convolutional Neural Networks(CNN)structure are used for multi-head feature analysis.The attention mechanism is added on the basis of BILSTM and CNN to extract network buzzwords features Mechanism,and optimize the activation function according to the CNN and LSTM network structure,and finally the model uses Conditional Random Field(CRF)to output the optimal label.The research results show that the multiple attention mechanism CNN-BILSTM-CRF model proposed in this paper has a higher performance improvement compared with the general combination model,which can effectively improve the recognition rate of buzzwords,and has broad application and research prospects.
Keywords/Search Tags:Network buzzword recognition, Deep learning, Compound feature, The neural network
PDF Full Text Request
Related items