Knowledge Based Supervised Learning

Posted on:2009-07-01

Degree:Master

Type:Thesis

Country:China

Candidate:C L Zhang

Full Text:PDF

GTID:2178360275970259

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

This thesis study the problem about learning with knowledge based data. Traditionalmachine learning algorithms relies on high quality labeled data to model and predict unla-beled data. However, there are a famous problem that labeling data is very time-consumingand costly. It has become a bottleneck for the development of machine learning. Web pageand text classification are important applications of machine learning. In order to efficientlyclassify web pages, machines need a large amount of labeled documents. In this thesis, wenotice a trend in current web. With the development of web service and applications, thereare more and more public data available in the internet. They contains not only ?at text data,but also extra information like labels and structures. Since anyone can easily obtain suchweb information. We are interested in the problem to utilize such data to supervise the ma-chine learning process, especially text classification process, and alleviate their requirementon labeled data and documents.For this purpose, this thesis deal with the problem from two aspects. Firstly, we designa knowledge data acquiring algorithm; secondly, we design a knowledge supervised learningalgorithm.In order to design knowledge data acquiring, we focus on studying how to automati-cally label web page data, and make them become knowledge. Our idea is to utilize existinghuge taxonomy and classify web pages into such taxonomy. The difficulty is, there are toomany candidate classes, which make traditional machine learning and text classification al-gorithm not work well. Moreover, large scale requires high efficiency. This prevents us fromcomplicated algorithms or incorporating too many extra information. This thesis noticed thatNaive Bayes Classifier is very fast, efficient and easy implemented. They are valuable fea-tures to the problem discussed here. Though Naive Bayes Classifier performance very badin presence of a large number of classes, this thesis deep analyze the characteristics of Naive Bayes and find out two server problems that largely hurt the performance of Naive Bayes.By fixing these two problems, the thesis significantly improve the performance and make itable to provide reliable knowledge.In order to design knowledge supervised algorithm, this thesis study how to utilizeknowledge data with categories to replace traning data, and reach good performance. Thedifficulty is, knowledge cover a large amount of semantic topics, while test data usually veryshallow and only cover few topics. To overcome this obstacles, this thesis design a two-stagerisk minimization algorithm. In the first stage, this algorithm generate related knowledgedata for the test data. In the second stage, knowledge and test data mutually communicate.This deep mine the underlying useful information of the knowledge. The entire algorithm isdesigned under the risk minimization framework. This algorithm get very good performancein the experiment. It not only significantly improve the baseline, but also achieve comparableperformance against learning with labeled documents.

Keywords/Search Tags:

Knowledge, machine learning, text classification, Naive Bayes Classification, risk minimization framework

PDF Full Text Request

Related items

1	Research On Text Classification Algorithms Based On Machine Learning
2	Research On Text Classification Algorithms Based On Machine Learning
3	Research On Text Classification Algorithm Based On Naive Bayes Method
4	Research On Text Classification Method Based On Machine Learning
5	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
6	Completing News Classification By Related Machine Learning Algorithms
7	Research On Key Problems In Text Classification Research Based On Deep Learning
8	Text Categorization Based On Naive Bayes Method
9	Incremental Learning Of Naive Bayes Chinese Classification System
10	Text Classification Algorithm Research Based On Naive Bayes