Font Size: a A A

Research On The Text Classfication Based On The Semi-supervised Learning

Posted on:2015-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:S DongFull Text:PDF
GTID:2348330518470409Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of information technology and network technology brings the massive growth of information. In such a condition, how to make people full use of the information is a daunting task. The text information is widely used in daily lives,such as the information retrieval, text databases, digital libraries, spam recognition,information filtering and microblogging theme digging areas.The background of this thesis is Chinese text classification. In this thesis, firstly analyzed the development status of Chinese text classification fields. Some problems is found in the semi-supervised learning area. In order to slove those existing problems, the algorithm based on semi-supervised learning is improved. The improved algorithm not only remedies defects in the existing supervised learning algorithm but also takes advantage of semi-supervised learning's good effort under small amount of training samples. Under the premise of reducing human involvement,the improved algorithm could ensure the classification accuracy.The improved algorithm has the following characteristics: it could build the best LDA topic model automatically with the help from density-based clustering algorithm OPTICS; it regards LDA topic model as a clustering algorithm and uses it labeled unsigned-samples automatically; it proposes weighted voting classification methods under Tri-Training framework which improves the accuracy of Tri-Training classification process.The improved semi-supervised learning algorithm is designed and implemented in this thesis. Its effect is tested and analysed. The results shows that improved semi-supervised learning algorithm in this thesis has achieved good effort in practical applications.
Keywords/Search Tags:LDA, semi-supervised learning, Tri-Training, text classification
PDF Full Text Request
Related items