Font Size: a A A

A Knowledge Based Approach For Tackling Mislabeled Multi-class Big Social Data

Posted on:2015-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2298330452466874Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The performance of classifcation tasks in machine learning feld extremely re-lies on data quality, while in real world label noises inevitably exists because of dataentry errors, transmit errors and subjectivity of taggers. Diferent methods have beenproposed to deal with label imperfection, including robust algorithms by avoid overft-ting, fltering mechanism to remove noises and correction mechanism to revise nois-es. However, these algorithms more or less have some disadvantages and perform notwell when noise level is high. In this paper, we propose an approach based on knowl-edge graph to perceive and correct the label errors in training data. Further more, weapply our method on big medical Q&A data and correct the noise labels in the dataset. Experiments reveal that our knowledge graph based approach can be efective onpromoting classifcation performance and data quality. The results as well show ourapproach can work in a relatively high noise level and be applied in other data miningtasks demanding deep understanding.
Keywords/Search Tags:knowledge graph, label correction, noise correction, data quality, classifcation
PDF Full Text Request
Related items