Font Size: a A A

Research On Classification Of Short Text Sequences With Multi-Views Based On Semantic Representation

Posted on:2020-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:M Y SunFull Text:PDF
GTID:2428330590996790Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the common data type,sequential data has both local features between adjacent sub-sequences,and global features of the overall sequence.Short text can be regarded as special sequential data constructed by words.The previous work on representation methods of short text sequence ignored the strong semantic combination for multi-words,resulting in the loss of significant features.And the previous short text classification methods did not take high-order semantic association between samples into account.In addition,the previous classification methods are based only on local or global features in short text sequences.It is difficult for a single view to comprehensively describe the inherent characteristics in the data.This thesis will focus on the above issues.Starting from the effective representation of short text sequence,this paper proposes a semantic representation-based classification algorithm for short text sequences,which employs the high-order semantic associations among short texts to make classification decisions.The model obtains the semantic clustering results of pre-trained word vector by employing an improved density clustering algorithm firstly.The clustering results are used to mine potential semantic units in texts,and the original text can be represented as a sequence of semantic units to achieve the representation of the short text sequence.Then,the model employs a convolutional neural network to learn the locally high-level semantic features representation of sequences,which are used to construct a hypergraph.Finally,the hypergraph learning achieves the classification of short text sequences by mining high-order semantic association among samples.Moreover,in order to learn more comprehensive feature descriptions in short text sequences and further improve the reliability of classification model,this paper proposes a multi-view feature learning algorithm for short text sequences.The algorithm constructs an integrated short text sequence modeling method by fusing two deep computation models to extract global features from short text sequences.Then,the local and global features are fused by using the deep canonical correlation analysis to obtain the third view fusion feature representation.Thereby,the multi-view hypergraph is constructed to extract the multi-view high-order association among samples,which is used to complete the classification task of short text sequences.The semantic representation-based classification algorithm for short text sequences was evaluated on five different benchmark datasets.Experimental results show that,the proposed short text representation method and the mining of high-order association by hypergraph can effectively improve the classification accuracy.Then,the proposed multi-view feature learning algorithm was used to extract the global features and fusion features from the short text sequences.And the multi-view hypergraph model was evaluated on the same datasets.The results indicate that multi-view features contribute to the improvement of the classification performance.In addition,the application of the proposed sequence classification framework in fault detection of aero-engine was carried out,and satisfactory results were achieved.
Keywords/Search Tags:Short Text Sequence Classification, Short Text Sequence Representation, Semantic Clustering, Multi-View Features Learning, Hypergraph
PDF Full Text Request
Related items