Font Size: a A A

The Development And Application Of Covering Algorithm Based On Constructive Learning

Posted on:2011-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z DuanFull Text:PDF
GTID:1118360305972950Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine learning is a subject of acquiring knowledge and rules from known material to build a forecasting model for unseen problems. It simulates human being's learning behavior and can improve on itself through continuous learning. After years of research, many outstanding learning methods, such as Support Vector Machine, Decision Tree and Neural Networks have been proposed, and applied to numerous machine learning areas. Chinese scholars have done substantial research work on covering-based learning methods, among which the covering algorithm based on constructive learning proposed by Zhang Ling and Zhang Bo is a good representative.Covering algorithm can construct neural networks based on samples' own characteristics and overcomes some general drawbacks of traditional neural networks, like learning is too slow, and the structure of network is hard to determine. Covering algorithm is very straightforward and can effectively handle multi-category classification and large-scale data, and performs well in many real applications. Many researches have been done to improve on this method and apply it to different domain problems. Current work focuses on single-instance single-label problems and can not solve some new learning questions. This dissertation extends covering algorithm in the following ways:(1) It does comprehensive research over covering algorithm and applies it to real classification problems.This dissertation researches the basic learning model of covering algorithm and the recent progress in theory and application comprehensively. It applies covering algorithm to text categorization and spam-filtering. And different strategies have been proposed according to specific-matters. In text categorization, dimension regulation is introduced to make different text categories get evenly represented in the feature vector which enhances the precision. In spam-filtering, extra information of every email is combined with body text to create the compounded feature to improve the accuracy. It also discusses how to minimize the risk of filtering out regular emails.(2) It analyzes kernel covering algorithm and extends it to fuzzy kernel covering algorithm.Support vector machine maps samples to high dimension space to construct optimal classification space and achieves excellent performance. Kernel covering algorithm utilizes kernel function and improves the accuracy effectively. But there are still some drawbacks. This dissertation analyzes the influence of proximity principle used to judge rejection points on classifier's effect. FKCA, i.e. fuzzy kernel covering algorithm, is proposed to improve the performance of classifier. The main improvement of FKCA is the change of radius selection and introduction of membership function. The physical explanation of the membership function is also discussed. A couple of reduction methods are introduced to improve on classifier which keep the number of covering down. Experiments show that the performance of these methods is effective.(3) It studies multi-label learning covering algorithm.In classic machine learning, each sample belongs to a single category, i.e., one label. But in real world, a sample can belong to multi categories, for instance, the text categorization and scene classification. This dissertation researches the decomposing of sample set and algorithm improvement, and explores applying covering algorithm to multi-label learning. Experiments show that multi-label covering algorithm performs at par with other multi-label learning algorithms and has the advantage of lower time/space cost. Since more efforts are required to label the multi-label training data, generally much data in training set is not labeled. To overcome this weakness, we adopt semi-supervised learning to improve the accuracy and it works well.(4) It discusses how to extend covering algorithm to multi-instance learning.Multi-instance learning is different from traditional supervised learning, unsupervised learning and reinforcement learning. It originates from predicting the molecules' activity and is regarded as the fourth learning framework. The learning object is the bag consisting of multiple instances. The labels of bags are known while labels of instances are unknown. And bag's label is determined by instances. Multi-instance learning is even harder than supervised learning with noise. This dissertation explores applying covering algorithm to multi-instance learning and proposes several algorithms which have comparable performance. We also discuss how to combine covering algorithm and other methods to solve the multi-instance multi-label learning and present the initial solution.The innovations of this dissertation are as follows:(1) Covering algorithm is applied to text categorization and spam filtering and different strategies are applied to improve the overall performance.(2) Presents the new method of determining the covering's radius; Introduces the new membership function for rejection samples; Gives physical explanation for membership function; Extends kernel covering algorithm to fuzzy covering algorithm; Presents several reduction methods.(3) Extends covering algorithm to multi-label learning, and new algorithm is proposed. The performance of MICA is at par with other well-known works, a multi-label covering algorithm based on semi-supervised learning is proposed.(4) Extends covering algorithm to multi-instance learning and proposes several methods. And an initial solution to multi-instance multi-label learning problem is provided.
Keywords/Search Tags:machine learning, covering algorithm, fuzzy classification, multi-label learning, multi-instance-learning
PDF Full Text Request
Related items