Font Size: a A A

Matrix Normal Factor Model And Matrix Data Logistic Regression Model

Posted on:2020-05-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:1360330620952329Subject:Machine learning and bioinformatics
Abstract/Summary:PDF Full Text Request
With the advance of science and quick progress of technology,the matrix-variate data becomes ubiquitous.In this thesis,the focus is mainly on two types of matrix da-ta.The first type is continuous,i.e.,each element is a real number.The representation ability of continuous matrix data is powerful.Firstly,the weighted network data can be regarded as the continuous matrix data.For example,the element of the matrix repre-sents the volume of data exchange between two routers or the strength of the hyperlink between two webpages.Secondly,the design matrix in traditional multivariate statistic analysis can be regarded as the continuous matrix data which has independent rows.Lastly,general images,videos,bioinformatic data and so on can be regarded as matrix data as well.The second matrix data type is binary discrete matrix data,i.e.,each ele-ment is either 0 or 1,which has usually been used to represent network data.Network refers to a set of individuals that connect to each other through a variety of relationship,where the individuals are called nodes and the link between two nodes is called edge.For example,in the field of computer science,the routers can be regarded as nodes in the Internet and the data exchange between two nodes can be regarded as the edge.The webpages in WWW is considered as nodes and the hyperlink between two webpages is an edge.In real practice,the power network,transportation network,social network,bioinformatic network and financial trade network are all classical network data.The matrix representation of the network data is called the adjacency matrix,where the element is 1 if there is an edge between the corresponding pair of nodes,otherwise is 0.In a word,the powerful representative ability of matrix data makes it can be widely applied to various fields and deserve further research.In the second chapter of this thesis,the statistical inference of high dimension-al continuous matrix data with only one observation is considered.We propose the matrix-variate normal factor model(MVNFA)by incorporating factor effects into the separable covariance structure of matrix normal model.For MVNFA model,we first prove the identification conditions,derive the estimation equations as well as their simplified version,and an iterative algorithm for parameter estimation.Next,the con-sistency and asymptotic normality of parameter estimators are derived.At last,the theoretical results are illustrated by simulation studies,and the application value of MVNFA is demonstrated by analysis of real dataIn the third chapter,the classification problem of nodes in binary discrete matrix data is considered.There exists three aspects of information here,which are the la-bel of each individual,the corresponding predictors and the network structure among them.The goal of this research is to incorporate the network structure into tradition-al classification problem(i.e.,the labels are responses).To this end,we propose the network-based logistic regression model(NLR)that takes the network structure into consideration.The NLR model assumes that whether two nodes are connected is in-fluenced by their class labels and by the similarity in their predictors.Furthermore,the attributes of each node are employed to predict the labels using the classical LR mod-el.Four interesting scenarios are used to investigate the link formation of the network structure under the NLR model and we determine the impact of the network structure on classification by deriving the asymptotic properties for the prediction rule under d-ifferent sparsities of network.Finally,simulation studies are conducted to demonstrate the finite-sample performance of the proposed method and a real Sina Weibo data set is analyzed for illustrative purposes.
Keywords/Search Tags:High Dimensional Matrix Data, Network Structure, Matrix-variate Normal Distribution, Factor Model, Classification, Logistic Regression
PDF Full Text Request
Related items