Font Size: a A A

Text Representation And Classification Based On Deep Learning

Posted on:2020-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2428330572471241Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Text representation is a key technology in the field of Natural Language Process(NLP).The quality of text representation often has a crucial impact on modern deep learning-based NLP systems.Traditional NLP systems are mostly based on feature engineering,requiring expert-defined features and well designed extractors.Effective features are often difficult to define and complex to implement.The development of deep learning technology has brought a maj or technological breakthrough for NLP.The method based on deep neural network can automatically learn the characteristics of text from the data.These methods not only the engineering quantity is greatly reduced,but also the classification effect is more advantageous.The structure of the model has become deeper and more complex.The bottlenecks of text classification tasks such as sentiment analysis and subj ect classification continue to break through.Although deeper neural network can obtain more powerful function approximation and data fitting ability,there is no work at present to indicate whether there is a correlation between the representation ability of the model and the semantic robustness.In addition,the pre-trained word vectors can often improve the performance of downstream NLP tasks.The existing research on the transfer strategy of context-free word vectors is still relatively preliminary.Therefore,this paper makes an in-depth study on the robustness of the semantic representation of the depth model and the transfer learning strategy of the word,as follows:1.Research on the correlation between text semantic representation ability and text classification performance.The definition of semantic robustness is given from the perspectives of information loss and noise redundancy.A reliable semantic evaluation model RAcc(Robust Accuracy)is proposed to make up for the shortcomings of traditional classification evaluation indicators that cannot evaluate the stability of the model.Based on the RAcc model,this paper deeply explores the correlation between the representation ability of the deep neural network model and the classification performance.The experimental conclusion reveals the limitations of the existing representation and classification model,and provides some inspiration for the research of NLP problems such as text classification.2.Transfer learning strategies for word vectors.Pre-training word vectors from a large amount of general corpus and transferring to downstream classification tasks can often improve system performance.Existing research often uses a fine-tuning strategy in which word vectors are jointly trained with downstream models to make the pre-trained word vector model better to fit downstream tasks.This paper points out that this strategy does not always bring about the expected performance improvement,but it can greatly increase the training resource overhead.In this regard,this paper firstly establishes a theoretical model called "3-signal" for this strategy,explains the limitations of the strategy theory.Secondly we propose two more efficient transfer learning strategies:Scaling and Lin-trans.Experimental results show that the proposed method not only brings significant performance improvement in classification tasks,but also shows stronger semantic robustness under RAcc evaluation.
Keywords/Search Tags:Semantic Representation, Text Classification, Word Embedding, Transfer Learning, Neural Network
PDF Full Text Request
Related items