On querying large scale information networks

Posted on:2013-05-05

Degree:Ph.D

Type:Thesis

University:University of Illinois at Urbana-Champaign

Candidate:Zhao, Peixiang

Full Text:PDF

GTID:2458390008489794

Subject:Computer Science

Abstract/Summary:

Social and technical information systems usually consist of a large number of interacting physical, conceptual, and human/societal entities. Such individual entities are interconnected to form large and sophisticated networks, which, without loss of generality, are often refereed to as information networks. Examples of information networks include the Web, highway or urban transportation networks, research collaboration and publication networks, biological networks and social networks. Clearly, information networks are ubiquitous and form a critical component of modern information infrastructure.;Theoretically, information networks can be modeled and manipulated as large scale graphs, which have gradually become the first-class citizens in the data management and mining fields. However, it is extremely inefficient to process such graph-structured data in any existing data models or computational frameworks. Real world information networks are massive, whose sheer size may simply overwhelm a direct application of any conventional graph algorithms designed and implemented for small or medium-sized memory-resident graphs. In the mean time, information networks are not static but rapidly changing all the time. The massive and dynamic nature of information networks has posed special challenges to effective query processing especially in scenarios where real-time responses are desirable.;In this thesis, we will consider a series of queries of practical value arising in real world network scenarios, and explore the effective and potentially scalable querying solutions for large scale information networks. All such queries have been found fundamental and critically important at the core of many advanced network applications. First of all, PRank is proposed to answer the structural similarity query: "which entities are (structurally) similar to a query entity" in an information network. Second, SPath is proposed as a high performance graph indexing mechanism to address general subgraph queries on large scale information networks. Third, GraphCube is designed as a new warehousing model that supports OLAP queries on large multidimensional information networks. Last, but not the least, gSketch is devised as a new sketch method that combines well-studied synopsis structures with a sketch partitioning technique in order to estimate and optimize the responses to basic queries on rapidly changing information networks. Our experimental studies demonstrate that our querying methods are highly efficient an scalable, and have achieved satisfactory performance for the fundamental queries on large scale information networks.;We should admit that the queries examined in the thesis are merely the tip of the iceberg. The marriage of information network analysis and query processing technology will bring many exciting opportunities for future study, which are briefed in the end of the thesis.

Keywords/Search Tags:

Information, Large, Query

Related items

1	On querying large scale information networks
2	Research On Key Techniques Of Reachability Query For Large-scale Digraphs
3	Research On Query Processing Technologies Over Large Scale Knowledge Graphs
4	Studies On Reachability Query Technologies For Large Graph Data
5	Information Retrieval System Based On Document Query
6	Research Of Graph Query On Large Graph Database
7	Query and Mining in Large Graph Databases
8	Efficient reachability query evaluation in large spatiotemporal contact networks
9	Research Reachability Query Method For Large-Scale Graphs
10	A Study Of Query By Humming System With Large Music Database