| In recent years,the technology of acquiring target domain information from text data has gradually become mature,and the need to realize the task of named entity recognition in agriculture has started to be raised.For natural language processing researchers,named entity recognition is the basic work of many tasks,and a series of research tasks in agriculture can be launched on this basis;for researchers in agriculture,they can quickly learn the latest progress in agriculture and reduce the waste of resources due to information asymmetry.Therefore,named entity recognition research in the field of agriculture appears extremely vital to both researchers in the field of agriculture and researchers in the field of natural language processing.However,there is currently no publicly available dataset for agriculture,and it is difficult to determine the effective identification method directly.To address the above issues,this thesis examines the following aspects respectively.Firstly,the corpus in the agricultural domain was constructed.Since most of the current research on named entity recognition is model innovation in the general domain,there is no public dataset in the agricultural domain that can be studied directly,so this thesis constructed an agricultural domain corpus,the primary task encompasses: employing web crawling techniques to collect the agricultural news texts of the Chinese Academy of Agricultural Sciences for the past 20 years,pre-processing the data,and communicating with relevant experts in the agricultural domain before entity types were determined,and referring to the general domain construction dataset method,an agricultural domain dataset was created that can be used for named entity identification,which contains a relatively wide range of agricultural domain knowledge.Secondly,entity recognition was performed on the constructed agricultural corpus.The current research methods for named entity recognition mainly use deep learning techniques,but most of these models are in specific areas for experiments,and there is no guarantee that they are equally applicable to agricultural datasets.This thesis proposed a Bi LSTM-Att-CRF model,the model learns data features through Bi LSTM,adding Attention enables the model to weight words of different importance,and the CRF layer gives the final prediction results.By comparing with five benchmark models,the results show that the model can better perform the named entity recognition task in agriculture.Finally,in order to enhance the recognition effectiveness,the researcher has devised the BERT-Bi LSTM-CRF model in this thesis,the model is able to solve the problem of multiple meanings of words in the text.Through experimentation,the BERT-Bi LSTM-CRF model demonstrates superior overall performance compared to Bi LSTM-Att-CRF,and it also improves the recognition efficacy of each entity significantly.In order to verify the normality and reliability of the agricultural domain dataset constructed in this thesis,a Chinese standard dataset was selected for the experiment under the same experimental conditions,comparing the values of each index of the same model under two datasets,the experimental results show that the corpus in the agricultural domain constructed in this thesis is of high quality. |