Font Size: a A A

Research On Multi-label Short Text Classification Method Of Grass-roots Association For Science And Technology Based On Hybrid Neural Network

Posted on:2023-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:N FangFull Text:PDF
GTID:2568306830961359Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,the technology of text automation is quite mature,but most of the researches on text classification at home and abroad are based on open corpora,while there are few applied researches on text features in specific industries.The research that meets the needs of big data analysis of GAST is still in the exploratory stage.According to the text classification requirements of the project "Research on the Operational Effectiveness and Innovative Development Path of National Popular Science Education Base in the New Era" of the Science Popularization Activity Center of CAST,and considering the characteristics of the GAST text corpus,such as its weak structure,strong domain and sparse features,this paper proposes a multi-label short text classification method based on hybrid neural network.Firstly,the text corpus of GAST is constructed by crawling the official websites of provinces,and the pre-trained word vectors of Word2 vec are adopted,and the pre-trained word vectors of the text and tags are calculated interactively,so that the text and tags are effectively integrated,and the text pays attention to the words related to tags.Secondly,BiLSTM model is used to obtain the global features of the context of the text,the Attention mechanism enhances the classification weight of keywords,and the parallel output results of the two are fused as the input of CNN model.The hybrid model can effectively fuse tag embedding;Finally,convolution kernels with different sizes are set in CNN model to extract the local features of the text,so as to effectively extract the local and global features of the text and focus on the important words.The experimental results show that the proposed method considers the logical combinations of different orders of the mixed models,and compared with other combined models,the F1 value is 2.30% higher than that of the best-performing BiLSTM-Attention-CNN hybrid model.Further optimizing the preprocessing part of the method,merging tag embedding makes the text pay more attention to the words related to tags,and compared with the pre-optimized model,the F1 value is increased by 3.75%,which verifies the effectiveness of the proposed method.In addition,ablation experiments with single models such as BiLSTM,Attention and CNN show the feasibility of the method.Therefore,the method can effectively evaluate the target and task completion of GAST,analyze and mine the main influencing factors of its operation status,and provide big data analysis algorithm support for scientific management of GAST activities.The thesis has 29 pictures,16 tables,and 56 references.
Keywords/Search Tags:Multi-label, Classification of short texts, Tag embedding, BiLSTM, Attention, CNN
PDF Full Text Request
Related items