Font Size: a A A

Model-based variable clustering with application to neurophysiology

Posted on:2006-04-28Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Wang, LiuxiaFull Text:PDF
GTID:2458390008476269Subject:Statistics
Abstract/Summary:
Cluster analysis is a method which puts objects into groups on the basis of a property of the objects that are obtained from observed data. It is a primitive technique in that no assumptions are made concerning the number of groups or the group structure. Interest in clustering has increased recently due to the emergence of several new areas of application, such as genetics, data mining, and finance.;Traditional cluster analysis is used to classify observations. Variable clustering, on the other hand, classifies variables according to their association across repeated observations. Three features make variable clustering a challenging statistical problem in this context. First, the typical association for variable clustering is measured in correlation. Even for scalar data, it is already non-trivial to estimate a correlation matrix from limited data. Secondly, for vector-valued variables, a correlation matrix may be defined to measure the association between two sets of variables. However, it is difficult to define a suitable scalar summary of that matrix. Thirdly, when dealing with functional variables, the problem becomes much more challenging because of high dimensions.;Our interest in variable clustering comes from neurophysiology, where each variable is the recorded activity of a particular neuron, and the goal is to describe the formation or dissolution of neuronal groupings under differing circumstances. For this purpose, a variable clustering approach is developed, and this approach has been inspired by one phenomenon in neurophysiology, "trial-to-trial variation". In neurophysiology, trial-to-trial variation explains a large portion of correlations. To model variable clustering, instead of modeling the correlation matrix directly, we make use of trial-to-trial variation, which essentially is a latent variable, to explain the variation across replicates. The thinking is that when two variables are highly correlated, they always share similar variation in their representations from replication to replication. In other words, the observed values of these variables may go up or down in a similar manner. Therefore, we would like to put the variables with similar variations across replication into one group.;First, this thesis focuses on the development of a scalar variable clustering model for normal data. We discuss its implementation using Markov Chain Monte Carlo. A series of simulation studies show that this model can greatly outperform more generic approaches to variable clustering. We also show how this methodology can be used to discover interesting activity patterns among simultaneously-recorded motor cortical neurons. Then, this thesis presents several forms of generalization, which include vector-valued variable clustering, functional variable clustering and Poisson variable clustering models. Some simulations are conducted to examine their performance. These models are also applied to analyze neuronal data.
Keywords/Search Tags:Variable clustering, Model, Data, Neurophysiology
Related items