Font Size: a A A

Research On Multi-label Text Classification Based On Semi-Supervised Learning

Posted on:2019-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Y XuFull Text:PDF
GTID:2428330545974112Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Today's society is a society with explosive information.The amount of data that humans produce every day can be calculated in terms of Pb.Therefore,the automatic text processing technology which helps people process information efficiently is of great importance.Text categorization is the basis of text processing technology.There are three key points in the task of multi-label text categorization.First,there are often links between the category labels of the text;Second,the highly abstract and highly logical nature of natural language determines that the semantic relationships,logical relationships,and contextual relationships in text sample are difficult to be defined;Third,as to a real classification task,the number of labeled text samples counts quite small relative to unlabeled text samples.In this paper,we propose a multi-label k-algorithm based on label co-occurrence matrix LRML-KNN and a PLSA-based multi-label k-nearest neighbor algorithm MLPLSA-KNN for the above three points.These two algorithms are experimentally demonstrated respectively,and experiment shows their validity and advantages in Multi-Label Text Classification Tasks.Then the two algorithms are integrated into the collaborative training method so that we propose a multi-label text classification algorithm CT-MLTC based on cooperative training.Experiments proves that this algorithm can effectively use unlabeled data to enhance the strength of classifier for dealing with multi-tag text classification problems.
Keywords/Search Tags:Multi-label, Text Classification, Semi-Supervised Learning, Co-training, PLSA, K Nearest Neighbors
PDF Full Text Request
Related items