Font Size: a A A

Research On Text Classification Based On Interaction Between Text And Label Encoding

Posted on:2023-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhouFull Text:PDF
GTID:2558307058963869Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text classification is a classic problem in the field of natural language processing,which automatically assigns the text to a predefined category through a certain classification method after the numerical vectorization of the text.Related technologies are widely used in recommendation system,information retrieval,data organization,public opinion monitoring and other fields in Internet applications with text as one of the main carriers of information.Text representation in text classification technology is the most core and critical step.Heuristic text representation methods are mainly represented by artificially constructed vectors,which is sparse.Word vector methods use neural network to map words to low-dimensional and dense space.Neural network language model which can flexibly extract the deep semantic and complex context features of the text are the focus of the current research.However,above model only use the information of the text itself and lacks global classification information.There is a difference between the text representation generated by direct modeling of the text sequence and the category information concerned by classification labels.This paper studies how to effectively use label embedding technology to pay attention to the content related to classification information in the early stage of text representation in order to improve the classification effect,as follows:1)In view of the fact that the existing interaction methods between texts and labels coding fail to fully represent the semantics of labels,a text classification method combining text content and label guided text encoding is proposed which can alleviate the degradation of the model in the current semantic text modeling process by getting a new representation of filtered text embedded by labels and combining the content coding of the text itself.The experimental results show it is effective to construct a more complete semantic label embedding and constrain the label embedding in the training phase.2)Aiming at 1)the semantic interaction between texts and labels is single,ignoring the explicit and implicit semantic correlation between texts and labels,a text classification method with bi-directional multi-channel semantic interaction between labels and texts is proposed which uses shallow interaction channels to capture explicit specific semantic interaction information,deep single and directional interaction channels to capture implicit abstract semantic interaction information,and designs a gated residual mechanism to obtain the historical information of labels more effectively.The experimental results show that that the proposed method is more competitive than the strong baseline method.3)Aiming at the problems of incomplete semantic description and synchronous optimization of labels in 1)and 2),a text classification based on label semantic constraints and multi-task learning is proposed.Wikipedia corpus is introduced to improve the semantic description integrity of label embedding,and the embedded representation of labels is constrained in the process of model training.The experimental results show that it is effective to construct a more complete label embedding and constrain the label embedding in the training phase.
Keywords/Search Tags:text classification, label embedding, multi-channel semantic interaction, label semantic constraints
PDF Full Text Request
Related items