Font Size: a A A

Research On Imbalanced Data Classification Method And Its Application In Sentiment Classification Of MOOC Course Comment

Posted on:2021-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhuFull Text:PDF
GTID:2428330620468780Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,more and more people choose courses to study on the MOOC learning platform.However,there is a problem of“low completion rate” in the MOOC learning platform.Aiming at this problem,some studies have pointed out that providing humanized learning support services to learners can promote learners' continuous learning.In addition,some studies have pointed out that sentiment support services are an important part of learning support services.In order to help the persons providing learning support services to observe the learner's emotional changes during the learning process and provide a research basis to the application of sentiment grooming of the learning support service,this paper uses the text sentiment classification method to classify the MOOC course comments.However,the problem of imbalanced data classification is one of the main challenges in the study of text sentiment classification.The training based on imbalanced data makes the classification results heavily skew to sentiment classes with a large number of samples and ignores sentiment classes with a small number of samples,thereby greatly reducing classification performance.But some studies have pointed out that the sentiment of most of MOOC course comments are positive.The training based on imbalanced data makes the classification results biased towards positive sentiment seriously,ignoring negative sentiment,thereby greatly reducing the recall rate of negative sentiment.In sentiment grooming applications,it is necessary to accurately identify students with negative sentiment.Therefore,the problem of imbalanced data classification in sentiment classification based on MOOC course comments is an urgent problem.Aiming at the problem of imbalanced data classification,this paper takes the sentiment classification of MOOC course comments as the application background,and does the following work at the data pre-processing level and classification algorithm level:(1)Data pre-processing level.This paper proposes an under-sampling method for imbalanced data based on the attention mechanism.First,the method divides the samples of majority class into n(n=the number of samples of minority class)groups.Second,it introduces the attention mechanism to obtain the total word vectorrepresentation of each group.Finally,it input the total word vector representation of each group and word vector representation of samples of minority class into CNN(convolutional neural network)training.The experimental results show that the method is superior to the under-sampling method for imbalanced data based on centroid space and the under-sampling method based on sample weight for imbalanced data in classification performance.(2)Classification algorithm level.This paper proposes an imbalanced text sentiment classification method that combines CNN and EWC algorithms.First,the method uses the random under-sampling method to obtain multiple sets of balanced data.Secondly,it feeds each set of balanced data to CNN training in sequence,and it introduces EWC algorithm in the training process to overcome catastrophic forgetting in CNN.Finally,it regards the model obtained by feeding the last set of balanced data to CNN training as the final classification model.The experimental results show that the method is superior to the integrated learning framework based on under-sampling and multi-classification algorithms,imbalanced sentiment classification method based on word vector pre-training and the imbalanced sentiment classification method based on multi-channel LSTM(long short-term memory)neural network in classification performance.
Keywords/Search Tags:MOOC course comment, imbalanced text sentiment classification, attention mechanism, CNN, EWC
PDF Full Text Request
Related items