GAN-based Named Entity Recognition For TCM Text

Posted on:2023-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Hao

Full Text:PDF

GTID:2544307031454924

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

The texts of TCM have the characteristics of short sentences,polysemy,complex and changeable sentence patterns,etc.These characteristics lead to the existence of sparse semantic features of short text corpus,polysemy,and lack of data in the field of TCM in the named entity recognition task of deep learning models.Aiming at the above problems,proposed a named entity recognition model GAN-NER based on generative adversarial network for TCM.The model consists of a BERT-Bi LSTM-CRF based generator and a CNN based discriminator.The generator model is used to generate entity class labels,and the discriminator model is used to distinguish and classify the feature distribution of generated data and real data,and the generator model parameters are updated through backpropagation,so that the generator model can generate more accurate entity labels.Aiming at the problem of polysemy in TCM texts,the BERT pre-training model is used,which uses a multi-layer attention mechanism to generate deep dynamic word vectors,and extracts the semantic features of sentences by embedding different word vectors output by the same characters.The method solves the problem of polysemy in TCM texts.After experimenting with the benchmark model,the overall performance of the model is improved by 5% in F1 value.Aiming at the problem of sparse semantic features of short texts,feature distribution is constructed for the semantic features of sentence sequences through feature fusion.The way of combining the semantic features of sentences with the feature distribution of labels makes the sparse matrix form a high-dimensional dense matrix after being integrated into the feature distribution of labels.This method solves the problem of sparse semantic features of short texts.Aiming at the scarcity of TCM text data samples,a generative adversarial network is introduced.Through the mutual game between the generator and the discriminator,the generator model can be trained on a small corpus dataset to generate the same model from other datasets.The entity category label of knowledge architecture solves the problem of scarcity of datasets in the field of TCM.In order to select samples with similar feature distribution to the real samples from the generated samples,the similarity of the feature distributions of the two is compared through an active learning algorithm,and then the generated samples are sorted and screened.The most representative "Huangdi Neijing" and "Traditional Chinese Medicine Syndrome" in the classics of TCM are used as the datasets of this experiment,and the GAN-NER model is used for ablation experiments.The experimental results are 90.01%,81.33%,and 85.28% in accuracy,recall,and F1 value,which verifies that the model has a good recognition effect in the named entity recognition task in the field of TCM text.Figure 29;Table 12;Reference 50...

Keywords/Search Tags:

Chinese medicine text, generative adversarial networks, named entity recognition, BERT, deep learning

PDF Full Text Request

Related items

1	Research On Named Entity Recognition Technology For TCM Field
2	Research On Named Entity Recognition Of Biological Pathogens Based On Neural Networks
3	Deep Learning-based Recognition Of Named Entities In Chinese Electronic Medical Records
4	Study On Named Entity Recognition Model Of Cancer Patient Online Questioning Text Based On Transfer Learning
5	Research On Named Entity Recognition And Entity Relationship Extraction Of Medical Data Text Based On Attention
6	Named Entity Recognition In Medical Field Based On Deep Learning Of Chinese
7	Named Entity Recognition Of Electronic Medical Records Based On Deep Learning
8	Research On Chinese Medical Named Entity Recognition Combined With Active Learning
9	Medical Named Entity Recognition Research Based On Deep Learning
10	Named Entity Recognition In Chinese Medical Text Based On Lattice LSTM