Font Size: a A A

Several Typical Machine Learning Methods And Their Applications

Posted on:2011-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:G T ZhouFull Text:PDF
GTID:2178360302499947Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, more and more data have been collected in various real-world application fields. How to make use of the data to improve our life experience is a problem that everyone is facing with. As a hot research area in artifical intelligence, machine learning offers a promising way to analyze and learn knowledge from the given data. Existing research has also shown that many application problems can be effectively and efficiently solved using machine learning approaches. This thesis studies several typical machine learning techniques as well as their applications. Specifically, we focus on the following four topics:1. Class-imbalance learning and cost-sensitive learning and their applications to cross-selling problem. Cross-selling is regarded as one of the most promising strategies to make profits. The central issue in real-world cross-selling applications focuses on the identification of potential cross-selling customers. However, the performance of customer prediction suffers from the problem that class-imbalance and cost-sensitivity arising simultaneously in the data sets collected from this domain. To address this problem, we propose an effective method with the idea of class-imbalance learning and cost-sensitive learning. The method is a three-stage process. In the first stage, the method generates a number of balanced training data sets by combining under-sampling and over-sampling techniques; then a base learner is trained on each of the data set; at last, the final decision-making model is obtained by using an optimal threshold based voting scheme. The effectiveness of our method is validated on the cross-selling data set provided by PAKDD 2007 competition where an AUC value of 0.6037 is achieved.2. Semi-supervised learning and its applications to fingerprint image segmentation. Fingerprint segmentation is one of the key preprocessing steps in an automated fingerprint identification system (AFIS). Effective segmentation of fingerprint from background can speed up the following processes and improve recognition accuracy. Traditional segmentation algorithms usually require a number of expensive labeled fingerprints when training models, but leave the substantive unlabeled fingerprints untouched. To incorporate labeled and unlabeled data together, this paper proposes a semi-supervised fingerprint segmentation method based on a co-training style algorithm. Under the view of pixel-level features, i.e. Coherence, Mean and Variance, our method employs Label Box and SVM as two base learners and co-trains the final model for segmentation. Experiments performed on FVC 2002 databases show that our method can effectively and robustly exploit unlabeled data to serve the purpose of segmentation.3. Distance metric learning and its applications to content-based image retrieval (CBIR). Distance or similarity measure is one among the keys of a high-performance CBIR system. Metric learning has been proved itself having good capability to learn appropriate distance metric. However, most existing approaches are not applicable to CBIR since they work in an off-line manner, while and the few on-line metric learning methods yet could not make use of the substantial unlabeled images in the database. This work proposes an on-line metric learning method with Qsim idea for CBIR. A Mahalanobis distance metric is learned for the user's query together with the following feedbacks. By incorporating the idea of Qsim, our method can effectively exploit unlabeled images. We formulate the task into a series of convex optimization problems which can be solved efficiently in an on-line manner with close form solution. Experiments performed on COREL image database also show the efficacy of our method.4. Relevance feature mapping and its applications to CBIR. The performance of a CBIR system mainly relies on the accuracy of its ranking results. This work presents a ranking method using relevance feature mapping, where each relevance feature measures the relevance of an image to some profile underlying the image database. The method is a two-stage process. In the off-line modeling stage, it constructs a collection of models which maps all images in the database to the relevance feature space. In the on-line retrieval stage, it assigns a weight to every relevance feature based on the query image, and then ranks images in the database according to their weighted average feature values. The method also incorporates relevance feedback which modifies the ranking based on the feedbacks through reweighted features. We show that the power of the proposed method derives from the relevance features. Experiments performed on COREL image database validate the efficacy and efficiency of our method.
Keywords/Search Tags:machine learning, cross-selling, fingerprint image segmentation, content-based image retrieval, relevance feature mapping
PDF Full Text Request
Related items