Font Size: a A A

Community-based Information Discovery Algorithm Over Domain Data Graph

Posted on:2016-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:J M WangFull Text:PDF
GTID:2308330461477073Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, various types of network data is becoming larger, and the structure of domain data graph which is composed of relevant network data is becoming more and more complicated. It is difficult for end users to find useful information when large amount of search results are generated by domain graph information retrieval.Currently, main improvements foucs on understanding and analyzing user queries and search results (e.g. query expansion, relevance feedback), the effect of improvements is limited because domain data themselves are not paid much attention to. In fact, not only user queries are ambiguous and search results are diverse, but also domain data have complicated relationships among entities, rich semantics, uncertainty and diversity.Information discovery is a technique between information retrieval and knowledge discovery, which requires more preprocessing work than information retrieval such as information integration, information extraction and indexing, information clustering. Community is the relevant sub-graph with high cohesion and low coupling; its scale is much smaller than the traditional domain data graph. Community discovery on domain data graph do more preprocessing work for information discovery.The main contributions of this paper are as follows. Firstly, the related theories about information discovery and community are introduced, and the present several typical community discovery algorithms are analyzed and compared. Secondly, information discovery model based on community is proposed. Thirdly, structure-topic-based community discovery (STBCD) algorithm and community-based information discovery (CBSTAR) algorithm are designed based on this information discovery model. The STBCD algorithm uses both structural and attributes information to partition the domain graph to a set of communities. The CBSTAR algorithm firstly search top k communities which are related to information discovery keywords, and then only these communities are loaded into main memory, and the CBSTAR algorithm searches for candidate results just on these communities. In the processing of sorting the discovery results, firstly mergering these candidate results and a more reasonable sorting strategy is designed which not only the releverance between communities and keyword but also the influence of the node contain the keywords and the node does not contain the keywords need to be considered.This paper makes the best use of above methods to design a community based information discovery prototype system over domain data graph and uses the DBLP dataset to evaluate the effectiveness and efficiency of this prototype system. P@K is employed to evaluate the experiments and a comparison and analysis is made for deficient experiments. The final experimental results show that the algorithm of this paper can improve the efficiency while the effectiveness is still acceptable.
Keywords/Search Tags:Domian Data Graph, Information Discovery, Community
PDF Full Text Request
Related items