Research On Aspect Category Detection Based On Data Resampling And Text Generation From Imbalanced Perspective

Posted on:2024-09-09

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Yan

Full Text:PDF

GTID:2568307052991619

Subject:Library and Information Science

Abstract/Summary:

PDF Full Text Request

In the aspect-based sentiment analysis(ABSA)subtask of natural language processing(NLP),aspect category detection,dealing with aspect-level category imbalance in text data has been a challenging research topic.As current detection models tend to focus more on majority class features,this phenomenon may lead to difficulties in identifying minority classes and ignore the richer sentiment or category information contained in the text data.As a result,the problem of screening for minority categories is often a more important task in the category detection task for review texts.Traditional imbalanced classification algorithms may fail when the data categories are skewed too much,making them difficult to apply to real-life imbalanced text category detection scenarios.Therefore,this paper proposes a dual strategy of Select-SMOTE algorithm(select-synthetic minority oversampling technique)and hybrid enhancement algorithm to solve the aspect category data skewing problem in text datasets based on the perspective of data category non-equilibrium.The specific work in this paper is as follows.(1)Based on data resampling techniques,this paper proposes a Light GBM(light gradient boosting machine)unbalanced aspect category detection model for the Select-SMOTE algorithm.Among them,the algorithm of Select-SMOTE is used to solve the problem that the sample generation of traditional SMOTE algorithm is too random and may generate noise.The algorithm divides the aspect class samples and interpolates a small number of class samples to allow interpolation only when two sample points are not both boundary samples.In addition,the algorithm uses inter-class boundary sample rejection to ensure that the boundary between the majority and minority classes is clearer.Finally,the processed equilibrium dataset is output and fed into the Light GBM model with fine-tuning operations such as weight adjustment and hyperparameter optimization to perform the category detection task in terms of online reviews.The experimental results show that the algorithm proposed in this paper outperforms the baseline unbalanced aspect class detection algorithm and demonstrates good generalization ability over multiple datasets.(2)Based on text generation techniques,this paper proposes a hybrid enhanced Bert(bidirectional encoder representations from transformers)unbalanced aspect category detection model.The model identifies the degree of imbalance in the dataset by constructing an equilibrium formula,and generates a category-balanced dataset using a combination of XLNet(extra-long network)text generation and noise perturbation according to the generation multiplicity.The generated dataset is then fed into the Bert pre-training model and parameter tuning is performed to improve the performance of the aspectual category detection model.The experimental results show that the proposed model has significant advantages in terms of accuracy over other deep learning aspect category detection models,and can effectively solve the dataset imbalance classification problem through the text generation hybrid enhancement strategy.(3)Based on the Select-SMOTE,hybrid enhanced unbalanced learning technique is proposed in this paper,we design and implement an aspect category detection system for unbalanced text datasets.The system has several key features that can effectively improve the aspect category detection performance and practicality.First,the system provides users with an intuitive understanding of the aspect category imbalance of the input text dataset.Second,the system automatically performs aspect category balancing on text datasets for data imbalance problems,and displays the balanced data category degree graph for users to view.Finally,the system integrates the data balancing operation with the text aspect category detection function,making the system easy to operate and user-friendly.

Keywords/Search Tags:

Imbalanced learning, Aspect category detection, Data resampling, Data generation, System applications

PDF Full Text Request

Related items

1	Research On Data Resampling Technology For Imbalanced Data Classification
2	A Study Of Ensemble Learning Method For Imbalanced Data Classification And Its Applications
3	Resampling Methods For Imbalanced Data
4	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
5	Research On Imbalanced Data Classification Algorithms Based On Ensemble Learning
6	Research On The Imbalanced Data Learning
7	Research Of Boosting Classificaion Algorithm For Imbalanced Data
8	Imbalanced Data Classification Analysis Based On Generative Adversarial Networks And Reinforcement Learning
9	Classification Of Imbalanced Sample Based On Stream Data
10	Research On Imbalanced Data Classification Algorithm Based On Extreme Learning Machine