Font Size: a A A

Research On And Implementation Of Chinese Structured Information Retrieval

Posted on:2002-09-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B ZhangFull Text:PDF
GTID:1488300914457354Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The central themes of this dissertation are the Chinese information retrieval and the structured information retrieval. Among the themes, four aspects are researched on and probedinto.They are the similarity calculation between document and query, query expansion, thetranslation of the queryand structured information retrieval.The significant researchcontributions that come out of the dissertation are: (1) Argumentation of a point that the word-based indexingmethodmust be the oneemployed into the Chinese information retrieval system. A new method named PM-based weightcalculation of term pairs is illustrated systematically to compute the association relationshipbetween terms. (2) Investigation of the effects on the retrieval performance by the proximity and mutualinformation of the term pairs in the Chinese information retrieval system. A conclusion is madeby the results of experiments that the proximity of termpairs is more helpful for the improvementof the retrieval performance than their mutual information does when the latter cannot becalculated precisely. (3) Presentation of a query expansion method based on the association matrix among the local information.The query expansion process can be described as follows. Firstly, the association value between terms can be calculated by borrowing the main idea of the automatic thesaurus construction based on the second order association hypothesis in the top-list documents retrieved by the original query. Secondly, the top-list terms can be gotten by the rank of their association value. Finally, the query expansion can be achieved by adding the top-list terms andtheir weight into the original query. (4) Presentation of a query translation method based on the mutual information betweenterms. The method provides a new path to select the translationsof the term, and indirectly preserve the phrase information in the query by the term.s association list. A destination languagequery is finally constructed within it the term has its own weight. (5) Illustration of the methods to improve the retrieval performance of the traditional information retrieval system by the use of the structure information in the XMLdocuments. Byintroducing the document structure index database, element index database and attribute indexdatabase, the structured retrieval aimed at the XML documents is achieved and the Chinesestructured information retrieval systemCSIR is designed within it some important parts areimplemented.
Keywords/Search Tags:information retrieval, structured retrieval, associationmatrix, query expansion, Chinese information processing
PDF Full Text Request
Related items