Research On Information Retrieval Technology

Posted on:2008-03-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S M Wang

Full Text:PDF

GTID:1118360215998555

Subject:Computer applications

Abstract/Summary:

With the spread and the rapid development of Internet, online information increases greatly.So, how to organize and process the large amount of this information becomes a challenge.The research of text classification and information retrieval helps people efficiently findtheir interested information online, which means helps people find what they truly needfrom increasingly information. Three aspects, which related to information retrievaltechnology, will be discussed in this paper.In first part, technique about text classification will be discussed. We will (1) proposesemantic category, and construct a dictionary of graphic structure, along with an algorithmfor this graphic structure. As a enlightening knowledge of text classification, the dictionaryimproves the ability of simulating illation and processing opening corpus of the system; (2)propose an algorithm, which imitates human's behavior, On one hand the algorithm isbased on the point that the information of an document can be tell by its title, so whenfeature vector is processed the algorithm enhances its weight; on the other hand, a weightparameterÏ‰vector is designed to simulate human's skimming and skipping behaviorfor calculating method of a document cluster center, and a weight of the feature that thereare more positive examples than negative ones is enhanced. The experiment shows: Thealgorithm greatly improves the performance of a text classification system.Questions about Web pages will be discussed in the second part, including: (1) Giving akey technique to weight the index in information retrieval. As for search engines aredesigned to find the Web pages, which the user need. In order to weight the index, weexplore the feature of the Web pages that written in HTML. The experiment demonstratesthat the precision is improved compared with the traditional method (tf-idf) when the recallis low.(2) Bringing forward a new concept "Topic Keywords Set" (TKS). As forinformation retrieval online, the objectives searched are Web pages, the feature of thesepages is that they often small, presenting just one subject. TKS along with the explorationof the words' relationship, by calculating distance between the user's query and TKS,re-sort the result list. (3) Query expansion is an efficient way in improving informationretrieval quality. And in query expansion the selection of expansion words is a crucial anddifficult step. By analyzing the words co-occurrence, we proposes a new method to evaluate words' relevance. With this method, selected expansion words are relevant withthe whole query, capable of representing the theme of query, and effient in improving theperformance, which proved by experiment.At last, a research on multimedia information retrieval, which based on content, will bediscussed. The discussion will be on basis of some different descriptors under the MPEG-7standard. According to above, we will: (1) propose a method, using dominant colordescriptor in MPEG-7, to extract the key frames from the scenes, along with an experiment;(2) give an experiment in key frame retrieval, taking advantage of the different searchingarea of dominant color descriptor and homogeneous texture descriptor; (3) apply the twoabove achievements into the material base of "CG(Computer Graphics) producingproject management system".

Keywords/Search Tags:

Information Retrieval, Text Categorization, Words' Relevance, Search Engine, MPEG-7, Content-based Image Retrieval, Query Expansion, Machine Learning

Related items

1	Research On Full-text Information Retrieval Technology For We Chat Content
2	A Restricted Domain Text Retrieval System
3	Based Relevance Feedback Image Retrieval Techniques And Realization
4	Information Retrieval System Based On Document Query
5	Research Of Image Retrieval System Based On MPEG-7
6	The Research And Realization Of Text Content Relevance Based On Ontology
7	The Research About The Education Resources Search Engine Based On The Content
8	Research Of Search Engine Key Technique And Optimize Performance
9	Research And Application On Expansion Term Ranking Model For Query Understanding
10	Research On Query Expansion Technique Of Retrieval System In Biomedical Field