Font Size: a A A

GAN-based Named Entity Recognition For TCM Text

Posted on:2023-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y F HaoFull Text:PDF
GTID:2544307031454924Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The texts of TCM have the characteristics of short sentences,polysemy,complex and changeable sentence patterns,etc.These characteristics lead to the existence of sparse semantic features of short text corpus,polysemy,and lack of data in the field of TCM in the named entity recognition task of deep learning models.Aiming at the above problems,proposed a named entity recognition model GAN-NER based on generative adversarial network for TCM.The model consists of a BERT-Bi LSTM-CRF based generator and a CNN based discriminator.The generator model is used to generate entity class labels,and the discriminator model is used to distinguish and classify the feature distribution of generated data and real data,and the generator model parameters are updated through backpropagation,so that the generator model can generate more accurate entity labels.Aiming at the problem of polysemy in TCM texts,the BERT pre-training model is used,which uses a multi-layer attention mechanism to generate deep dynamic word vectors,and extracts the semantic features of sentences by embedding different word vectors output by the same characters.The method solves the problem of polysemy in TCM texts.After experimenting with the benchmark model,the overall performance of the model is improved by 5% in F1 value.Aiming at the problem of sparse semantic features of short texts,feature distribution is constructed for the semantic features of sentence sequences through feature fusion.The way of combining the semantic features of sentences with the feature distribution of labels makes the sparse matrix form a high-dimensional dense matrix after being integrated into the feature distribution of labels.This method solves the problem of sparse semantic features of short texts.Aiming at the scarcity of TCM text data samples,a generative adversarial network is introduced.Through the mutual game between the generator and the discriminator,the generator model can be trained on a small corpus dataset to generate the same model from other datasets.The entity category label of knowledge architecture solves the problem of scarcity of datasets in the field of TCM.In order to select samples with similar feature distribution to the real samples from the generated samples,the similarity of the feature distributions of the two is compared through an active learning algorithm,and then the generated samples are sorted and screened.The most representative "Huangdi Neijing" and "Traditional Chinese Medicine Syndrome" in the classics of TCM are used as the datasets of this experiment,and the GAN-NER model is used for ablation experiments.The experimental results are 90.01%,81.33%,and 85.28% in accuracy,recall,and F1 value,which verifies that the model has a good recognition effect in the named entity recognition task in the field of TCM text.Figure 29;Table 12;Reference 50...
Keywords/Search Tags:Chinese medicine text, generative adversarial networks, named entity recognition, BERT, deep learning
PDF Full Text Request
Related items