Font Size: a A A

Text Information Retrieval Modifier Role In The Study

Posted on:2005-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:H N MaFull Text:PDF
GTID:2208360122497125Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the coming of Internet era, information changes each passing day and shows an exponential increasing tendency, which leads to information explosion. However, the phenomenon happens more often than not that is when people retrieve documents, the exact information which did match the need can't be obtained, on the contrary too much information trash, which is out of the need of users, is engendered. Therefore, improving the effectiveness and quality of the information retrieval (IR) system has become a desired issue.The objective of this paper is to research into the importance of modifier words, which is a factor often ignored but maybe influences IR system effectiveness, to document information retrieval. According to this, a modified vector space model (MVSM) is developed. Experiments using English documents are also done to show the importance of modifier words.During the course of research, the achievement can be summarized as follows:(1) In the traditional keyword-based information retrieval (IR) system such as Boolean IR model, queries and documents are represented by many separated words or terms of which some are nouns and verbs, and some are adjectives and adverbs. Based on the traditional vector space model (VSM), MVSM is designed and realized. The main difference between the traditional one and the new one is to combine the modifier (adjective in this paper) with its corresponding headword (noun in this paper) as integrated keyword (combined term) in the new model, which can confirm the exact meaning of polysemy to some extent on the one hand. Meanwhile expanding the modifier and headword according to their synonyms and recombining them can result in finding out some other useful documents, which can't be obtained originally because of the rare keywords of queries.(2) Experiments for verifying the importance of modifier words have been implemented by using benchmark corpora (TREC). The MVSM is applied to the experiments. And 150 queries are inputted in for test. By comparing the results obtained from MVSM with that of traditional VSM, the difference is remarkable, showing the great importance of modifier words.(3) Information retrieval models typically express the retrieval performance of the system in terms of two quantities: precision and recall. And from the result charts in Excel format both of the precision and the recall of MVSM are found increased visually. The experiment results show that the importance of modifier words can't be ignored in document information retrieval.
Keywords/Search Tags:Document Information Retrieval, Modifier Words, VSM, Precision, Recall
PDF Full Text Request
Related items