Font Size: a A A

Multi-Labels Text Classifier With Primary And Secondary Labels

Posted on:2016-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2308330476954980Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Accompany with the rapid development of information technology, computer science and Internet technology, information resource grows dramatically at the same time. How to organize, manage and use the information becomes a main issue that the users of information need to concern about. The development of Text-automotive-classification technology can help users to improve the efficiency when handling the information, to save time of manual work and to help users use the information conveniently. Text-automotive-classification is a hot pot among the research area of data search and data mining nowadays, and it has garnered a lot of attention and rapid development. It is one of the key technologies of machine learning and nature language data processing. But the main streamed classify methods can only label an article with multiple classify labels, which cannot identify the priority of the labels, that is, they cannot tell which label is much more important to users when others has less priority.This article takes text information as the research object, and carries out a research on the technology of text classification automatic processing. The main work covers the following parts:1. Proposes a method based on multi-label classification with primary and secondary label(MLTCPSL). This method focuses on the characters of labels, which are that the primary and secondary labels are independent, weights differently and quantity varies. /this method solves the problem by decomposing the problem into two parts: to obtain the primary label multi-class single-label classification and to obtain secondary label multi-class multi-label classification. Now, the primary label an d secondary label can be identify automatically.2. When doing the experiment, it will be noticed that the quantity of data is huge, when the quantity of some specified data is extremely rare. This situation will affect the accuracy of classification machine. To handle with the unbalance situation occurs in multi-class classification, by introducing basic-classification machine and decision-door improvement; this article will propose a method with basic-classification machine and decision-door improvement method to solve the unbalance situation and to improve the accuracy of classification machine. By applying this improvement, the accuracy of processing primary label can approach 90% while the secondary can approach 80%.3. Design and implement an automatic update method for MLTCPSL. And by improving SVM with online way, enable the method with self-learning ability by enabling it with self-adaption ability.
Keywords/Search Tags:Text Categorization, Multi-Labels, Primary Labels, Secondary Labels MLTCPLS
PDF Full Text Request
Related items