Font Size: a A A

Research On Networked Data Classification Based On Active Learning

Posted on:2016-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:H H XuFull Text:PDF
GTID:2308330464453269Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, it has produced a variety of complex networks, such as social networks and biological networks. The wide application of these networks has generated large amounts of networked data. Classification in networked data is recently an important issue in the field of machine learning and data mining and has been widely concerned and researched. For classification of networked data, not only considering how to utilize the characteristics of networked data, but how to exploit the links between networked data. It has a big difference with the traditional data classification. Therefore, classification in networked data is an important and urgent issue to research.An intensive study on active learning techniques and the networked data classification is conducted in this paper. Propose a framework of active learning for networked data classification and several different specific active learning approaches for classification in networked data. The whole work of this paper is as follows:(1) For the problem of sampling strategy in active learning, this paper analyzes the existing the measurement of uncertainty, representativeness and diversity criterion and explores the effect of specific measurement criterion in networked data classification. This provides a theoretical basis for the following study.(2) Different sampling strategies measure the instance value from a different point of view. The contribution degree of sampling strategies is not the same for selection of instance. To effectively exploit the link between networked data, this paper proposes an adaptive active learning method for networked data classification which integrates several different sampling strategy criterions. This method can dynamically adjust the various criterion weights to effectively estimate the instance information content, and finally select the high-valued instances to label.(3) Batch mode active learning method selects multiple instances for labeling at a time. In order to fully take the networked data characteristics into account, build the instance correlation matrix by measuring the instance uncertainty, representativeness and diversity. Based on the correlation matrix, this paper proposes a batch mode active learning method for networked data classification based on optimal instance subset, which can select an optimal subset instances to be annotated by an oracle, and can ensure that the classifier has a better classification performance.Several networked datasets are used to conduct experiments to verify proposed methods in this paper. The experimental results are analyzed to show the effectiveness of the proposed approaches.
Keywords/Search Tags:Active Learning, Networked data, Batch Mode, Correlation link, Sampling Strategy
PDF Full Text Request
Related items