Font Size: a A A

The Research And Implementation Of Named Entity Recognition For Chinese Social Media

Posted on:2019-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:X F MaFull Text:PDF
GTID:2348330563954333Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Named Entity Recognition a key technology in the field of natural language processing,and its research results will directly affect many natural language processing tasks.In the information age of self-media,the Internet is full of comments,how to better supervise and manage these data is an urgent problem which needed to be solved.Therefore,the name entity recognition of Chinese social media has a great significance and it becomes the research focus of this thesis.This thesis mainly designs and implements a named entity recognition model based on long-short Term Memory(LSTM)and Conditional Random Field(CRF),and we apply it to social media dataset successfully,Compared with the traditional one-way LSTM and CRF combined model or the single CRF model,it has improved in both nominal mention and named mention's recognition.The specific work is as follows:Firstly,we use single character with position embedding as the input of the model.This thesis adding the character's position information to the character embedding,that will make the embedding carries the feature of character as well as the boundary information of the word.Secondly,the network structure use LSTM which contains the bi-directional hidden layer.Compared with the one-way hidden layer,the new network structure can obtain the context information better,and can extract the feature of the input sequence better.Thirdly,this thesis add the attention layer between the LSTM layer and the CRF layer,which can help the model to focus on the local feature of the input sequence better.Fourthly,this thesis combine the objective function the model itself and the objective function of the single character with position embedding.By means of joint training,the single character with position embedding can influence each other,and have better feature,and can also relieve the problems of words appear too little or some words are not in the dictionary.In this thesis,we use sina webo corpus as the experiment corpus,we also compared with the experimental results of CRF model and traditional one-way LSTM+CRF model.The experimental results show that the improved network model has a good performance in the nominal mention and named mention.We also apply the improved named entity recognition algorithm to the actual application scenario,encapsulate the interface of the mode,finally we design and implement a named entity recognition system.
Keywords/Search Tags:Named entity recognition, LSTM, Attention mechanism, single character with position embedding
PDF Full Text Request
Related items