Font Size: a A A

Research On Agricultural Named Entity Recognition Based On Deep Learning Method

Posted on:2023-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:P F ZhaoFull Text:PDF
GTID:1523307127978309Subject:Agricultural Engineering
Abstract/Summary:PDF Full Text Request
Agricultural named entity recognition is an important basic task in agricultural natural language processing and a key technical link of constructing agricultural intelligent question answering system.High quality named entity recognition model can provide effective support for agricultural information extraction and semantic retrieval.At present,there are still the following challenges in the research of named entity recognition in the agricultural field :(1)The naming methods of agricultural entities are diverse,the entity boundary is fuzzy,and the complex characteristics of the entity itself affect the performance of the model.The model with single sentence as the processing unit leads to inconsistent entity tags;(2)Agricultural named entities have different meanings in different contexts,and there is a phenomenon of polysemy.The vector representation obtained by the pre training model based on Word2 vec is static and unitary,resulting in the wrong marking of this kind of entities;(3)The entities in agricultural corpus are unevenly distributed,and there are rare or unknown entities.The model can not learn abundant entity feature information,resulting in low recognition rate of such entities.(4)Strokes and radicals of Chinese characters contain rich semantic information of Chinese characters.Existing named entity recognition methods based on character level and word level ignore the feature information of Chinese characters themselves,resulting in insufficient representation of potential features of Chinese characters,which affects the performance of the model to certain extent.In view of the above challenges of named entity recognition in agriculture,this paper carries out research on named entity recognition in agriculture based on deep learning technology.The main research contents are as follows:(1)In order to alleviate the influence of the characteristics of named entities on the model,an Att-BiLSTM-CRF method based on attention mechanism is proposed.The Word2 vec pretraining tool was used to obtain the character level vector representation of agricultural text on a large scale unlabeled corpus based on CBOW model,in order to alleviate the impact of incorrect word segmentation results on model performance.Document-level attention mechanism is introduced to increase the attention of the model to the target entity.Taking the document as the model processing unit,and the similarity between the target entity and related entities is obtained through cosine alignment function to alleviate the problem of inconsistent entity labeling.In order to reduce the dependence of the model on external input data,a MHA-BiLSTM-CRF method based on multi-head self-attention mechanism was designed to further improve the performance of the model.Through the self attention mechanism,the multi-head self-attention mechanism can deeply mine the internal dependencies between words and between characters in sequential text,obtain richer and more comprehensive semantic information,and improve the recognition precision of the model for named entities.The experimental results show that the recognition precision,recall rate and F1 value of Att-BILSTM-CRF method are 92.05%、91.68% and 91.86% respectively,which can alleviate the problem of inconsistent entity labeling.Compared with Att-BiLSTM-CRF method,MHA-BiLSTM-CRF method not only improves the inconsistency of entity marks,but also further improves the performance of the model,and the model recognition precision,recall rate and F1 value are improved by 0.44、0.14 and 0.29 percentage points respectively.(2)Aiming at the problem of uneven entity distribution and polysemous word,a Bert-BiLSTM-CRF method integrating dictionary features is proposed.Based on the bi-directional Transformer encoder,BERT pre-training tool can dynamically obtain the vector expression of the target word by vector representation of the word information,position information and segement information of the target entity,combined with the context information to enrich the sequence semantic information and solve the problem of polysemy of a word.To solve the problem of low recognition rate of rare or unknown entities due to uneven entity distribution,an external agricultural knowledge base is introduced to supplement and improve the corpus through named entities contained in the knowledge base.In order to obtain external dictionary features,two feature extraction methods,N-gram feature template and BDMM bidirectional maximum matching are designed.The model splicts the character level vector representation obtained by BERT and the external dictionary features as the initial input of BiLSTM-CRF layer to obtain the optimal annotation sequence results.The experimental results show that the recognition precision,recall rate and F1 value of BERT-BiLSTM-CRF method integrating dictionary features are 94.84%、95.23% and 95.03% respectively,which can alleviate and improve the recognition precision of polysemic,rare or unknown entities to some extent,the recognition precision of unknown entities and rare entities are 80.29% and 91.54% respectively.(3)In order to alleviate the impact of insufficient representation of potential features of Chinese characters on the performance of the model,an RS-ALBERT-BiGRU-CRF method was proposed to integrate stroke features and radical features of Chinese characters.Stroke and radical of Chinese characters can ensure the uniqueness of Chinese characters.In order to better obtain the potential features of Chinese characters,a radical feature extraction model based on CNN and stroke feature extraction model based on CNN/BiLSTM were designed to obtain the internal potential features of agricultural text through deep learning model.In order to improve the computational efficiency of the model,RS-ALBERT-BiGRU-CRF model was built,optimize and speed up the original model framework.Finally,the model learns semantic information at multiple fine-grained levels,such as radical and stroke,and enriches vector expression of target words,and the model recognition precision,recall rate and F1 value are 95.01%,95.42% and 95.21% respectively,further improving the model performance and generalization ability.The named entity recognition method proposed in this study,which is oriented to the agricultural field and based on deep learning technology,can effectively solve the problems faced by ner tasks in the agricultural field.This method can effectively improve the model’s ability to express text semantics and potential features of single words,optimize and improve the model framework from the multi-dimensional aspects of recognition precision and computational efficiency,and promote the research of NER tasks in the agricultural field.
Keywords/Search Tags:Agricultural, Named entity recognition, Deep learning, Attention mechanism, Natural language processing
PDF Full Text Request
Related items