Community-based Information Discovery Algorithm Over Domain Data Graph

Posted on:2016-08-08

Degree:Master

Type:Thesis

Country:China

Candidate:J M Wang

Full Text:PDF

GTID:2308330461477073

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet, various types of network data is becoming larger, and the structure of domain data graph which is composed of relevant network data is becoming more and more complicated. It is difficult for end users to find useful information when large amount of search results are generated by domain graph information retrieval.Currently, main improvements foucs on understanding and analyzing user queries and search results (e.g. query expansion, relevance feedback), the effect of improvements is limited because domain data themselves are not paid much attention to. In fact, not only user queries are ambiguous and search results are diverse, but also domain data have complicated relationships among entities, rich semantics, uncertainty and diversity.Information discovery is a technique between information retrieval and knowledge discovery, which requires more preprocessing work than information retrieval such as information integration, information extraction and indexing, information clustering. Community is the relevant sub-graph with high cohesion and low coupling; its scale is much smaller than the traditional domain data graph. Community discovery on domain data graph do more preprocessing work for information discovery.The main contributions of this paper are as follows. Firstly, the related theories about information discovery and community are introduced, and the present several typical community discovery algorithms are analyzed and compared. Secondly, information discovery model based on community is proposed. Thirdly, structure-topic-based community discovery (STBCD) algorithm and community-based information discovery (CBSTAR) algorithm are designed based on this information discovery model. The STBCD algorithm uses both structural and attributes information to partition the domain graph to a set of communities. The CBSTAR algorithm firstly search top k communities which are related to information discovery keywords, and then only these communities are loaded into main memory, and the CBSTAR algorithm searches for candidate results just on these communities. In the processing of sorting the discovery results, firstly mergering these candidate results and a more reasonable sorting strategy is designed which not only the releverance between communities and keyword but also the influence of the node contain the keywords and the node does not contain the keywords need to be considered.This paper makes the best use of above methods to design a community based information discovery prototype system over domain data graph and uses the DBLP dataset to evaluate the effectiveness and efficiency of this prototype system. P@K is employed to evaluate the experiments and a comparison and analysis is made for deficient experiments. The final experimental results show that the algorithm of this paper can improve the efficiency while the effectiveness is still acceptable.

Keywords/Search Tags:

Domian Data Graph, Information Discovery, Community

PDF Full Text Request

Related items

1	Research On Local Overlapping Community Discovery Algorithm Based On Directed Graph
2	Research Of Copyright Content Propagation Based On Pirates Community Discovery In Peer To Peer Networks
3	Web Community Cores And Complete Communities Discovery Strategy
4	The Research Of Frequent Community Search Algorithms In Temporal Graphs
5	Research On The Key Problems Of Web Community Discovery Based On Multiple Features
6	Discovery Method Of Micro-blog Community Based On H-Index
7	Multi-dimensional Community Discovery And Influence Analysis Oriented On Mobile Data
8	Design And Implementation Of Social Network Community Discovery Algorithm Based On Spark
9	Research On The Algorithm Of Community Discovery And Key User Mining Based On Big Data
10	Community Discovery And Tracking Methods Based On Core Members