Font Size: a A A

On querying large scale information networks

Posted on:2013-05-05Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Zhao, PeixiangFull Text:PDF
GTID:2458390008489794Subject:Computer Science
Abstract/Summary:
Social and technical information systems usually consist of a large number of interacting physical, conceptual, and human/societal entities. Such individual entities are interconnected to form large and sophisticated networks, which, without loss of generality, are often refereed to as information networks. Examples of information networks include the Web, highway or urban transportation networks, research collaboration and publication networks, biological networks and social networks. Clearly, information networks are ubiquitous and form a critical component of modern information infrastructure.;Theoretically, information networks can be modeled and manipulated as large scale graphs, which have gradually become the first-class citizens in the data management and mining fields. However, it is extremely inefficient to process such graph-structured data in any existing data models or computational frameworks. Real world information networks are massive, whose sheer size may simply overwhelm a direct application of any conventional graph algorithms designed and implemented for small or medium-sized memory-resident graphs. In the mean time, information networks are not static but rapidly changing all the time. The massive and dynamic nature of information networks has posed special challenges to effective query processing especially in scenarios where real-time responses are desirable.;In this thesis, we will consider a series of queries of practical value arising in real world network scenarios, and explore the effective and potentially scalable querying solutions for large scale information networks. All such queries have been found fundamental and critically important at the core of many advanced network applications. First of all, PRank is proposed to answer the structural similarity query: "which entities are (structurally) similar to a query entity" in an information network. Second, SPath is proposed as a high performance graph indexing mechanism to address general subgraph queries on large scale information networks. Third, GraphCube is designed as a new warehousing model that supports OLAP queries on large multidimensional information networks. Last, but not the least, gSketch is devised as a new sketch method that combines well-studied synopsis structures with a sketch partitioning technique in order to estimate and optimize the responses to basic queries on rapidly changing information networks. Our experimental studies demonstrate that our querying methods are highly efficient an scalable, and have achieved satisfactory performance for the fundamental queries on large scale information networks.;We should admit that the queries examined in the thesis are merely the tip of the iceberg. The marriage of information network analysis and query processing technology will bring many exciting opportunities for future study, which are briefed in the end of the thesis.
Keywords/Search Tags:Information, Large, Query
Related items