Research On Multi-label Classification Network For Chinese Text And Noisy Labels

Posted on:2024-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Niu

Full Text:PDF

GTID:2568307106468304

Subject:Communication engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the deepening of the accreditation of engineering education and the reform of curriculum thinking and teaching in China,the cultivation and evaluation of non-technical literacy,such as professional ethics and values,has become a hot issue for scholars’ research within the education field.Teachers usually use questionnaires and other methods to evaluate the textual views of students’ feedback,however,it is time-consuming and laborious to read and analyze the viewpoint texts manually,and for online courses with a large number of students,it may not be possible to cover a large amount of textual data using manual methods.Therefore,an efficient computer-intelligent evaluation method is urgently needed.With the rapid development of natural language processing technology and deep learning technology,text classification offers the possibility to solve this problem,but effective analysis and evaluation of Chinese texts still face great challenges due to their lengthy utterances,flexible and complex structures,and the large amount of noise contained in labeled datasets.This paper takes students’ opinions on "engineering sustainability" as the researching object,and views the text-based opinion analysis task as a multi-label text classification problem,allowing the computer to automatically assign multiple class labels to each comment text to reflect the different aspects or levels of the opinions involved in the text,thereby evaluating the comprehensiveness and focus of students’ opinions,and providing a basis for course continuous improvement.The main efforts of this paper are as follows:1.A Chinese textual dataset of students’ opinions on "engineering sustainable development" is constructed.The data were collected by both questionnaire and classroom Q&A.The dataset contains 763 Chinese text samples,and each sample is manually labeled by respondents with 5 class labels.Considering the language habits of individual languages and the subjectivity of manual annotation,the dataset is characterized by flexible and variable language syntax,high annotation noise,and lengthy data for some texts.To be used for subsequent classification evaluation,the groundtruth of the dataset were obtained using a voting strategy based multi-person manual labeling method.2.A Bert-CT based multi-label text classification model is proposed to solve the noise labeling problem.First,the BERT model pre-trained on Chinese Wikipedia is introduced and applied to the multi-label classification task.Second,to address the problem of noisy labeling in the dataset,we combine BERT with a co-teaching method to train two BERT neural networks simultaneously and pass "clean" data to each other in each small batch dataset for model co-training to reduce the noisy labeling problem.The experimental results on the constructed Chinese text dataset of students’ opinion on "engineering sustainable development" show that the accuracy of this model can reach 84.7%,which is significantly higher than the traditional BERT model and can effectively solve the noise annotated Chinese multi-label text classification.3.To address the issue of low accuracy in Chinese long text classification,an MLformer multi-label classification model is proposed.The model is based on the Longformer encoder,and by introducing multiple attention mechanisms,it can obtain feature representations that contain global semantic and local contextual feature information of text,thus avoiding the contradictory problem between computational complexity and classification accuracy.The experimental results on a long text dataset of students’ opinion on "engineering sustainable development" in Chinese show that the MLformer model outperforms the BERT model in terms of accuracy,precision and F1 value,and the classification results can provide a strong reference for students’ opinon analysis and evaluation.

Keywords/Search Tags:

Multi-label text classification, noise labels, Chinese long text

PDF Full Text Request

Related items

1	Research On The Essential Technology Of Multi-Label Chinese Text Classification
2	Research And Implementation Of Multi-labels Text Classification Via Deep Learning
3	Research On Feature Extraction Of Multi-label Text Classification
4	Multi-label Text Classification Based On Long Short-Term Memory
5	Research On Extreme Multi-label Text Classification Based On Label Knowledge
6	Identifying Labels From Multi-label Texts Using Deep Learning
7	Multi-label Text Classification Method Based On Hyperbolic Manifold Representation
8	Research On Multi-label Long Text Classification Algorithm Based On Transformer
9	Research On Text Multi-label Classification Algorithm Based On Label Correlation
10	Algorithm Research For Chinese Text Multi-label Classification