| Text classification plays an important role in emotion analysis and problem classification,which is the basic work of Natural Language Processing(NLP).The rapid development of the Internet makes the text information in the network increasingly miscellaneous and the difficulty of the text classification task is increasing.The current text classification methods have problems of low efficiency and unsatisfactory accuracy.On the other hand,different language features are needed when dealing with different NLP tasks.For text classification tasks,semantic features are more critical,while for dependency resolution tasks,syntactic features are more important.However,most of the current methods dealing with text classification tasks mainly improve the algorithm performance by mixing and calibrating features,without distinguishing the types of features and application effects.Such methods are often difficult to get the best results for different tasks.In view of the above problems and deficiencies,combined with the practical application requirements of text classification tasks,this thesis carries out research on text-based multi-classification technology,and the main research contents are as follows:(1)Aiming at the problems of low efficiency and unsatisfactory accuracy of existing text classification methods,a multi-label text classification algorithm based on cluster integration and semantic similarity annotation is proposed.Fuzzy C-Means(FCM)clustering algorithm is used to recognize and capture text information.Based on the Resource Description Framework(RDF),an optimization algorithm for semantic similarity mapping is constructed to simplify text information and reduce computing tasks.(2)Aiming at the problem that most text classification methods based on deep learning fail to distinguish feature types and application effects when extracting features,leading to low recognition accuracy.A Multi-Dimensional Feature Extraction Algorithm Based on Cross-layer Attention Mechanism,Multi-dimensional feature extraction algorithm based on cross-layer attention mechanism(MDFECA)to filter out more semantic features.First,filter different types of language characteristics through a stacked Transformer network structure.Then,a new cross-layer attention mechanism is proposed,which uses high-level features to supervise low-level features,so that features at different levels of text can be paid attention to,so as to refine the filtering process.In addition,a fine-grained feature extraction module is added,which combines the channel attention mechanism that enhances the expression ability of Convolutional Neural Networks(CNN)and the spatial attention mechanism to extract the important word information in the sentence and capture the important local features.The proposed MDFECA algorithm can extract more semantic information,which can be used to recognize and classify different types of text effectively.Through the above research,the method proposed in this thesis can reasonably extract the multidimensional features of text and apply it to the task of text classification,providing a feasible method for improving the efficiency of classification.On the one hand,in order to verify the validity of the multi-label text classification algorithm based on cluster integration and semantic similarity annotation,this thesis obtains data from the Digital Bibliography&Library Project(DBLP)and builds a database.The performance of the proposed algorithm is evaluated by using evaluation indexes such as computational start-up response time,classification time of database text information and classification accuracy.The results show that the performance of the proposed algorithm is obviously better than that of the comparison model.On the other hand,the proposed MDFECA algorithm was applied in 3 public data sets including Sentiment analysis and event classification--Internet Movie Database(IMDb),Stanford Sentiment Treebank(SST)and Baidu Event Extraction dataset 1.0(DuEE 1.0).The results show that the best performance has been achieved on the three datasets.The above results show that the proposed MDFECA algorithm can pay attention to and effectively utilize the features of different dimensions to improve the classification accuracy,and has a high use value in text classification tasks.In multi-label text classification,it is feasible and effective to introduce clustering integration and semantic annotation algorithm,which has a positive impact on text retrieval research in network data,and also has a high practical value in improving people’s reading experience. |