Font Size: a A A

Chinese Named Entity Recognition Based On Feature Fusion

Posted on:2024-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:X D LiFull Text:PDF
GTID:2568307124960189Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER)is an essential technique in natural language processing.It aims to extract entity with specific meanings from the text.Identifying entities in text can offer substantial support for downstream tasks,including but not limited to knowledge graphs,machine translation,and question-answering systems.The thesis conducts research on Chinese NER tasks by integrating deep learning and Chinese text features.To address the sparse distribution problem of entities in text,the thesis proposes a model named ACS-NER integrating local features.This model takes single Chinese character as input,and segments the text into sections by a sliding window,and constructs local features in a layer composed of local attention and convolution.The local attention mechanism dynamically assigns different weights to each item within the window to highlight the entity-related contents.The convolution operation combines the contextual information inside the window to generate the corresponding local feature encoding.In addition,ACS-NER also utilizes self-attention mechanism to generate global features of the text,which improves the recognition performance on long sentences and increases robustness in handling inputs with varying lengths.Experiments on the Resume,MSRA,and Weibo datasets demonstrate that ACS-NER can effectively improve the recognition performance of Chinese named entities.Medical texts are characterized by numerous specialized terminologies.Due to the lack of coarse-grained information,single Chinese character-based entity recognition models often omit named entities in medical texts,resulting in a low recall rate of the model.To address this issue,this thesis proposes the MLWFF-NER model,which integrates word features.This model has two key components: word boundary features and similar word features.Word boundary features are generated by statistically analyzing the relative position of Chinese characters in words.For similar word features,the model matches multiple words for each Chinese character using cosine similarity of embedding vectors and generates corresponding vector representations through dynamic weighted averaging.By introducing word boundary features and similar word features into the model at multiple levels,the MLWFF-NER is capable of recognizing more entities.Experiments on the CCKS2019 and CMe EE datasets indicate that the introduced word features can effectively improve the recall rate of the model,improving the recognition performance of Chinese medical entities.
Keywords/Search Tags:Named entity recognition, Deep learning, Feature fusion, Attention mechanism
PDF Full Text Request
Related items