Font Size: a A A

An Information Retrieval Graph Model Based On Term Importance

Posted on:2016-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:H HongFull Text:PDF
GTID:2308330470964019Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of mobile Internet technology, the ubiquitous search has become an important part of our daily life. As the amount of information on the Internet presents an explosive growth in the past ten years, how to find the most relevant information resources which are correspond to users’ need(query), has become a major problem faced by current search engines, which needs the search engines to provide an effective information retrieval model. Therefore, to find more effective retrieval models is a long-term challenge in the study of information retrieval.In information retrieval modeling, to determine the importance of index terms of the documents is an important content. Those retrieval models that use a bag-of-word document representation are mostly based on the term independence assumption, and calculate the terms’ importance by the functions of TF and IDF, without considering about the relationship between terms. In this paper, we used a document representation based on graph-of-word to capture the dependencies between terms,according to the graph, we used the Markov chain computing method to calculate the terms’ importance of the document, and proposed a novel information graph retrieval model TI-IDF. The main innovation of this paper includes:(1) A new graph-of-word representation of the document;Through the document’s clauses, we built the graph-of-word of each document(an weighted undirected graph). In the graph, vertices represented the indexed terms,the undirected edges between terms represented that the two terms had occurred in a same sentence of the document, the weight of the edges represented the count of co-occurrence sentences.(2) A measure of the term importance;According to the graph-of-word, we obtained the co-occurrence matrix and the probability transfer matrix of terms, and used the Markov chain computing method to calculate the terms’ importance of the document.(3) Information retrieval graph model based on term importance: TI-IDF.We used term importance(TI) to replace traditional term frequency(TF) at indexing time, combined with the TF×IDF term weighting scheme and existing TFnormalization methods, we determined our retrieval graph model’s term weighting scheme through the comparative experiments.We conducted experiments on a standard dataset, the results show that, compared with the traditional retrieval models, our retrieval graph model TI-IDF possesses a better robustness, its search results are consistently superior to BM25, and better than BM25 extension models, TW-IDF and other models in most cases.
Keywords/Search Tags:Term Weighting Schemes, Retrieval Model, Graph-of-word, Term Importance, TI-IDF
PDF Full Text Request
Related items