Massive search for detecting group differences

Posted on:2002-07-03

Degree:Ph.D

Type:Dissertation

University:University of California, Irvine

Candidate:Bay, Stephen Dongjun

Full Text:PDF

GTID:1468390011490415

Subject:Computer Science

Abstract/Summary:

Comparing objects is a natural method for understanding their properties, especially when one object is well known and serves as a reference. With the availability of large databases of information, many analysts want to compare various groups in their data to understand the differences between them. For example, an admissions officer at UCI may be interested in comparing student applicants that accept UCI's admission offer to those that decline. A demographer may be interested in comparing the decennial Census databases to track how the Los Angeles - Long Beach population has been changing over the past few decades.; Because electronic data collection is easy, many data sets are very large and have many variables and examples making automated computer analysis mandatory. However, a straightforward approach where the computer considers every combination of measurement variables as a potential difference is infeasible because the number of candidates grows exponentially and quickly outstrips the processing power of modern computers. The huge number of candidates raises three major research questions: First, how do we deal with the computational cost of searching for differences in this extremely large space of candidates? Second, how do we keep false positives (errors) from accumulating during the search and dominating the results? Finally, there may be a substantial number of differences between the groups. How can the results be presented so they are easily understood by human analysts?; In my dissertation, I address these questions and I develop a computer tool that finds differences between groups from observational multivariate data. I demonstrate that this tool can analyze data in an exploratory manner and I then show how it can serve as an important component in other novel knowledge discovery algorithms such as multivariate discretization of continuous variables and characterizing classification models.

Keywords/Search Tags:

Comparing, Variables

Related items

1	Optimization of pre-processing variables for hyperspectral analysis of focal plane array Fourier transform infrared images
2	The Implemention And Design Of Web Crawler For Price Comparing Shopping Platform
3	Research And Application Of Techniques On Digital Comparing Inspection Of Point Cloud And CAD Model
4	Orthogonal Transformation Operation Theorem Of The Spatial Universal Rotating Magnetic Field Based On Independent Variables
5	Learning And Application Of Bayesian Networks With Hidden Variables
6	Research And Application Of Simulated Annealing Algorithm For Structural Optimization
7	Motivational variables that influence the attendance of science trainees at research seminars
8	Research And Application Of Tolerance Granular Computing Model In Article Comparing
9	Control of multi-finger pressing: Studied with mechanical and hypothetical control variables
10	Research On Algorithm Of Comparing Bio-sequences Similarity