Font Size: a A A

Research On Shallow-Semantic Model For The Information Retrieval

Posted on:2008-04-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:H N MaFull Text:PDF
GTID:1118360242467516Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
It is found out that at present, the information retrieval models based on statistic such as Boolean model and vector space model are used generally, which can't resolve the following problems well including polysemys discrimination, synonyms expansion, concept hierarchies and context semantic relationships, etc.To improve the efficiency of information retrieval systems, two kinds of shallow-semantic information retrieval models are proposed in the thesis, which are shallow-semantic vector space model and ontology-based shallow-semantic model. By means of the above two models, two information retrieval systems are built up respectively. The experiments are performed on the comparing the proposed information retrieval models with the traditional models on quantity. The corpus of experiment verifying of the former model is English text, and that of the latter is Chinese text.The main research contents and achievements can be summarized as follows:1. Related works on the shallow-semantic vector space model(1) Shallow-semantic vector space model. For improveing the traditional vector space model, the shallow-semantic vector space model is proposed. The main difference between the traditional vector space model and the new one is to combine the modifier (adjective in this thesis) with its corresponding headword (noun in this thesis) as an integrated keyword (combined term) in the new model, which can confirm the exact meaning of polysemy. Meanwhile expanding the modifier and headword according to their synonyms and recombining them can result in finding out some other useful documents, which can't be obtained originally because of the rare keywords of queries.(2) A fuzzy synonym thesaurus. During the research, to complete the query expansion of shallow-semantic vector space model, a fuzzy synonym thesaurus is built up based on the famous semantic lexicon WordNet. The fuzzy synonym thesaurus can make the query vector expanded well. At present, such thesaurus has been applied in the collaborative project with Japanese JUSTSYSTEM Corporation. Moreover the Japanese partner has developed a natural language processing tool namely NLPs.(3) Information retrieval experiments. Experiments for verifying the importance of the shallow-semantic vector space model have been implemented by using benchmark corpora TREC in English. And 150 queries have been chosen for testing. The experiment results shows out the great importance of the new model.(4) System evaluation. Information retrieval models typically express the retrieval performance of the system in terms of two quantities: precision and recall. And the precision and recall of shallow-semantic vector space model are both found increased visually.2. Related works on the ontology-based shallow-semantic model(1) Domain ontology. After analyzing the complaint records from the years of 2002 to 2005 of the mobile communication corporation of a certain city, an ontology in the domain of mobile communication corporation's complaint services is built up by the tool of Protege. During the course of buiding up the ontology, the top-down method combined with outspread approach to extracting concepts and the method based on the idea of Apriori algorithm to mining concept relationships are proposed, which can complete the domain ontology only by man-made.(2) Ontology-based shallow-semantic model. Based on the domain ontology, the ontology-based shallow-semantic model is proposed. The main difference between the traditional keyword-based methods and the new one is that by the inheritance of sup-concepts and sub-concepts and the query expansion, the recall of information retrieval system can be increased; and by the constraints of subjects and objects, the precision can be increased either.(3) An application instance of information retrieval. An information retrieval system is developed based on the new retrieval model. In addition, the system is applied to the domain of mobile communication corporation's complaint services. Some representative queries are selected. By comparing the results of new model with that of the traditional one, the significance of the ontology-based shallow-semantic model can be easily shown.(4) Individualization processing. By adding human-machine interactive check boxes, the users can dispose the information retrieval results further more according to their own interests. Finally the more clear and satisfied results can be obtained.
Keywords/Search Tags:Information Retrieval, Semantic, Vector Space Model, Ontology, Synonym Thesaurus
PDF Full Text Request
Related items