A theory of multitask learning for learning from disparate data sources

Posted on:2004-10-18

Degree:Ph.D

Type:Dissertation

University:Cornell University

Candidate:Schuller, Rebecca Ann

Full Text:PDF

GTID:1468390011962651

Subject:Computer Science

Abstract/Summary:

Many endeavors require the integration of data from multiple data sources. One major obstacle to such undertakings is the fact that different sources may vary considerably in the way they choose to represent their data, even if their data collections are otherwise perfectly compatible. In practice, this problem is usually solved by a manual construction of translations between these data representations, although there have been some recent attempts at supplementing this with automated algorithms based on machine learning methods.; This work addresses the problem of making classification predictions based on data from multiple sources, without constructing explicit translations between them. We view this problem as a special case of the problem of multitask learning problem: both intuition and much empirical work indicate that learning can be improved by attacking multiple related tasks simultaneously. However, thus far, no theoretical work has been able to support this claim, and no concrete definition has been proposed for what it means for two learning tasks to be “related.”; In this work, we introduce a general notion of relatedness between tasks, provide the standard sort of information complexity bound for such tasks, and give general conditions under which this bound is an improvement over standard single task learning results.; Finally, we apply these results to the problem of learning from disparate data sources. We give a decision tree learning algorithm for this problem for a particular type of data source disparity and demonstrate its empirical success on real data sets.

Keywords/Search Tags:

Data, Sources, Problem

Related items

1	The Analysis Method For Localizing The Sources Of Electroencephalogram Based On The Data Of Spatio-Temporal
2	Mining Disparate Sources for Question Answering
3	Study On Data Sources Discovery And Selection On Deep Web
4	The Research Of Information Evaluation Based On Sources Dependence
5	News Sources And News Framing
6	Hierarchical models for combining nonexchangeable sources of survival and functional data
7	The Design And Implement Of Tax-checking Based On Information Fusion Of Multi-sources Data
8	XML-based Data Sources Research And Application
9	Technique de parcellisation et de localisation des sources cerebrales a partir des signaux MEG
10	Integrating Deep Web data sources