Font Size: a A A

Algorithm Research On Text Classification And Named Entity Recognition Based On Deep Text Feature Representation

Posted on:2021-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:L H YuFull Text:PDF
GTID:2428330611966944Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With powerful feature learning ability,deep learning has been effectively applied and broken through in the field of natural language processing.How to learn the good text feature representation is one of the keys to judge the quality of a deep learning text representation algorithm.Good text feature representation can improve the performance of text classification and recognition.Text classification and Named Entity Recognition are two basic tasks and research hotspots in natural language processing.Based on these two tasks,this paper proposes two deep learning modes to extract effective deep text feature representation and improve the performance of text classification and named entity recognition.The research work mainly includes the following two aspects:1)A Global-Local Mutual Attention model(GLMA)for text classification is proposed.The model extracts global and local features simultaneously,and it utilizes a global-local mutual attention mechanism to learn the interaction and mutual effects between them to extract more effective global and local feature.The global-local mutual attention mechanism includes a local-guided global attention and a global-guided local attention.On one hand,the local-guided global attention assigns weights and combines global features of word positions that are semantic related to capture combined semantic.On the other hand,the global-guided local attention automatically assigns more weights to relevant local features to capture key local semantic.Besides,the weighted-over-time pooling in the model can effectively extract discriminant global-local feature representations.The experimental results on 23 datasets demonstrate that the model can extract more effective global and local feature representation and improve the accuracy of text classification.2)A Multiple-Level Topic-Aware Representation model(MLTA)for named entity recognition is proposed.The model utilizes a bi-directional recurrent neural network to extract sequential feature and models multiple-level topic representation: word-level topic representation and corpus-level topic representation by introducing the neural topic model.The word-level topic representation learns the relationships between words and latent topics which can capture different semantics under different contexts.The corpus-level topic representation can extract the corpus level global information and access deeper understanding on the meaning of each word.The experimental results on three named entity recognition datasets demonstrate the effectiveness of the proposed model.Besides,quantitative and qualitative experimental analysis and visualization analysis further verify the effectiveness of the multiple-level topic-aware representation model in identifying named entities that are ambiguous or out of vocabulary.
Keywords/Search Tags:Text Classification, Named Entity Recognition, Text Feature Representation, Mutual Attention Mechanism, Topic Modeling
PDF Full Text Request
Related items