Font Size: a A A

Research On Information Need Domain Of Information Retrieval

Posted on:2013-09-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:B WangFull Text:PDF
GTID:1228330398496404Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information retrieval as a means of access to information is an important part of information processing and is the focus research area of information processing. Information retrieval requires three important aspects:specification of an information need, document description and retrieval model. Among them, the specification of information need is an important part of information retrieval. It is possible to produce good search results only when the information need is properly understood and expressed. At present, information retrieval is essentially implemented as a key words matching process, and user’s query is assumed to be an accurate description of the user’s information need. In reality, a user’s query often cannot describe the underlying information need precisely. This unavoidably leads to unsatisfactory retrieval results.In order to improve the description of the query, relevance feedback is commonly used. This process tries to determine a set of related terms from the (pseudo-) relevant documents to enhance the user’s original query. The experiments have shown that the process is effective. However, we observe that terms is usually performed using heuristics. It is noted that the selection of these words is a heuristic idea, and usually assume that the user’s information need is an accurate description. Relevance feedback method attempts to use feedback to seek the accurate description of the user need. Indeed, it is usually impossible to arrive at the accurate description of the information need. The expanded query is only our best guess of the information need, which is still inaccurate.In this dissertation, we take a different approach. We assume that an information need is a semantic range. At the beginning, the only description of information need is the original query. When we get some feedback documents (user relevance feedback or pseudo-relevance feedback), we can build a better description of the information need, but this description is not trying to establish an accurate description of the information need, but frame a range of the information need. The feedback information can provide us with a lower bound R and an upper bound R:The lower bound corresponds to what a relevant document should contain (e.g. common terms shared by all the relevant documents), and the upper bound corresponds to what a relevant document may contain (e.g. all terms that appear in the relevant documents). The information need can be bounded with the domain Ⅰ=(R,R). The lower bound and upper bound of information need are derived in the dissertation, two boundaries of the domain are gotten, and the information need domain model Ⅰ=(R, R) is established.The information need domain has the following characteristics.(1) The lower bound of the information need domain expresses the core of information need that the user focuses on.(2) The upper bound of the information need domain contains the extended and extensive contents of information need, represents the breadth of the information need.(3) The information need domain loosely frames a range of user’s information need.The dissertation uses two mechanisms to establish the information need domain:user true relevant document feedback and pseudo relevant document feedback. In the former case, a set of relevant documents identified by the user will be used to derive a description of R and R. In the second case, the top n documents from the initial retrieval results are assumed to be relevant. This method has the advantage to be automatic, but may include irrelevant feedback documents. So the resulting information need domain is only the approximation.Based on the information need domain, the dissertation analyses the document similarity calculation method and establishes a similarity model. The dissertation trains and analyzes the model through a series of experiments on standard TREC test corpora. The new similarity model based on the information need domain is compared with three classic models---pseudo-relevance feedback language model:Mixfb_kl_dir, pseudo-relevance feedback tf_idf model:Fb_tf_idf and pseudo-relevance feedback probability model:Fb_okapi. The experimental results show that the retrieval performances of the similarity model based on the information need domain are improved.Compared with traditional methods, traditional methods often attempt to establish an accurate description of the information need. We establish a loose description of the information need and using a domain to frame a range for the information need. In summary, the main contributions of the research work are:(1)We propose the concept of the information need domain for IR and provide the method to determine the information need of domain.(2)We propose the mathematical model of the information need domain based on fuzzy set.(3)We propose the similarity model based on the information need domain.The main significance of the research work is to establish and improve the theoretical basis of information need, and on this basis, to establish appropriate similarity model and to improve information retrieval performance. The information need domain provides a new research idea, enriches new theories and methods for the field of information retrieval, and improves information retrieval performance in practical applications.
Keywords/Search Tags:information retrieval, information need domain, lowerbound, upper bound, document similarity
PDF Full Text Request
Related items