Font Size: a A A

Comparative Research On Chinese Documents Retrieval Based On Important Sentences And Based On Author's Abstract

Posted on:2009-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:C Q XuFull Text:PDF
GTID:2178360272488285Subject:Information Science
Abstract/Summary:PDF Full Text Request
This paper compares and analyzes content similarity between author's abstract and important sentences, and comparative researches retrieval performance based on author's abstract and based on important sentences.Firstly, this paper introduces the function of abstract and advantages of retrieval based on abstract, then it points out the dependence on technique of auto abstract, analyzes the necessity and feasibility of the important sentences based retrieval. The paper describes the principle and generating procedure of important sentences in detail: Firstly, download 2064 papers of Animal Husbandry and Veterinary as the test set, and performs a series of some text pre-process; Secondly, create the Animal Husbandry and Veterinary professional vocabulary with "Agriculture dick" as the basic table as well as the unknown words identified by the Carmm system; then use dynamic link library CarmmLib.dll to cut word of the test set, and calculate the weight of words and sentences weight; finally selecte the highest weight of a number of sentences as an important sentence group.In evaluation stage, this paper introduces the way of content similarity based on the vector space model into the similar comparison between important sentences and author's abstract. This paper takes advantage of cosine formula to compute similarity value, and makes use of 0.3,0.5 and 0.7 as threshold to get the evaluation result. Then it analyzes what this result implies. Besides, this paper evaluates retrieval results of 50 retrieval requests, which are based on author's abstract and important sentences. These 50 retrieval requests all come from the real user request of the Reference Department in Nanjing Agricultural University Library. Bool model is used in the process of retrieval. Recall ratio and precision ratio are introduced to evaluate the retrieval results. This paper analyzes the retrieval results in multiple ways, such as Summary Statistics, R/P/F histogram, R/P balance histogram. According to the comparison result, this paper gets the following conclusion: The retrieval performances of the way based on author's abstract and the way based on important sentences doesn't differ too much. The important sentences based retrieval is a litter better than the other way, especially in recall ratio.The main research results can be summarized to three aspects as follows: Firstly, during the stage of important sentences generating, this paper analyzes the struct characteristic of veterinarian papers, and points out the rule of important section distribution; Secondly, this paper introduces vector model into the comparison between author's abstract and important sentences, and makes use of the way of cosine similarity to compare the similarity between author's abstract and important sentences; Thirdly, this paper carries out the study of comparison between author's abstract and important sentences, and arrives at a conclusion that the retrieval performance of important sentences is a little better than the way of author's abstract.
Keywords/Search Tags:important sentences, abstract, documents retrieval, retrieval evaluation
PDF Full Text Request
Related items