Font Size: a A A

Research On Short Text Classification Based Upon Convolution Feature Encoding And Attention Mechanism

Posted on:2020-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:F H ZhuFull Text:PDF
GTID:2428330578479233Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Short text classification is a fundamental task in natural language processing.The target label is automatically assigned to a given sample using the classification model.However,due to the lack of sufficient word co-occurrence and contextual features of short text,text representation feature based on statistical methods is sparse.The di?? stributional word embedding can well alleviate this problem.However,there are still few researches combining with the characteristics of short text,and there is still improvement space on classification performances.This paper combines linguistic characteristics of short text to analyze the applicability of various elassification,and further conduct optimization research on methods with better performance on short text classification.The details are as follows.(1)Structure analysis of classification model for short textThe traditional classification based on statistical feature and the classification based on neural networks have their own advantages and disadvantages on short text classification task.This paper analyzes the applicability of various classification algorithms by combining the characteristics of short texts,and carries out adaptive analysis according to the specific experimental results of various algorithms,so as to select a classification algorithm applicable to the short text,paving the way for subsequent optimization research.Experimental results show that the classification method based on convolutional neural network is more suitable for short text classification task.(2)Short text classification method based on convolutional feature adaptationIn the view of linguistic characteristics of short text,such as refining words and multi-perspective information,perspective information refers to words that have domain characteristics and are capable of guiding classification.Since the importance of each perspective information to short text is different,and directly concatenating all perspective features which often lead to weak feature differentiation.Therefore,this paper proposes a classification method based on convolution feature adaption.This method enables the network to automatically learn and adjust the feature distribution of each perspeetive by evaluating the importance of multi-perspective features,so as to increase the attention to important perspective features and weaken the influence of irrelevant information.Experimental results show that this method can improve the performance of short text classification effectively.(3)Short text classification method based on interdependence of semantic unitsFragment information in short text is often closely related.However,the convolutional neural network ignores the relationship among overall fragment features when extracting the optimal features of text,which makes it difficult to extract the global optimal features.Therefore,this paper proposes a classification method of short text based on the interdependence of semantic units.This method establishes the dependencies of the semantic units contained in the short text,so that the network can fully understand the overall features of the text,and then extract the global optimal features.Experimental results show that the acquisition of global optimal features effectively improves the performance of short text classification.This paper analyzes the applicability of various classification methods combining the linguistic characteristics of short texts,and further combining the linguistic characteristics of short texts with the attention mechanism to optimizes the classification method from the perspective of convolution features.In the NLPCC 2017 classification corpus,the F1 score of classification methods based on convolution feature adaption and the interdependence semantic units are 1.70%and 1.97%higher than the baseline model.
Keywords/Search Tags:Short Text Classification, Convolutional Neural Network, Perspective Information, Semantic Unit
PDF Full Text Request
Related items