Font Size: a A A

Research On Document Representation Model Based On Query And Content

Posted on:2011-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhouFull Text:PDF
GTID:2178360308977370Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the online information increases exponentially. As the contradiction between the huge digital information and the ability for people to obtain it is increasingly outstanding, how to search relevant information quickly and accurately has become the hot spot of today's research in the field of information. In information retrieval, the quality of a document representation model is one of the important factors which affect retrieval performance. According to the comprehensive information theory, epistemology information is the trinity of syntactic information, semantic information and pragmatic information. The mainstream of document representation models at present primarily utilize syntactic and semantic information while are devoid of pragmatic information, which is the bottle-neck of retrieval performance improving.The thesis begins with an overview of the classic information retrieval model and how their to represent document at home and abroad, and the theory of comprehensive information and epistemology information is discussed latter. It then introduces the application status of pragmatic information in the query expansion, sorting algorithms and document representation, and emphasizes on the document organization method based on query set. This thesis analyses defects of this method , then aims at these defects and gives a concept of Stability Criterion for Query Sample Space, It proposes a document representation model based on users'query behavior and documents'content, in which the pragmatic information from users'implicit feedback and the semantic and syntactic information from documents is integrated to dynamically regulate the key-weight of index database, this model can consequently improve recall and precision rate in information retrieval. Experimental results show that our new model express documents'topic information well and significantly improving retrieval accuracy.This thesis also propose a document representation model based on co-occurrence query and on co-occurrence content aims to dig the deep level information of co-occurrence words, and then co-occurrence words'extraction and new model'formal description is given. Finally, a website search engine for the news network of **university is developed, which is based on the Lucene architecture and can real-time trace users'profile and dynamically regulate retrieval results according to the migration of collective profile.
Keywords/Search Tags:Information Retrieval, Document Representation Model, User Query Log, Implicit User Feedback
PDF Full Text Request
Related items