On improving information retrieval performance from structured, semi-structured and un-structured information sources

Posted on:2006-05-24

Degree:Ph.D

Type:Dissertation

University:University of Louisiana at Lafayette

Candidate:Shah, Biren N

Full Text:PDF

GTID:1458390008452113

Subject:Computer Science

Abstract/Summary:

The field of unstructured data retrieval for simple data types such as text and structured data retrieval in relational data models for transactional processing has already been well researched and commercially developed. However, more complex data types and models such as XML (as semi-structured data), data warehouses (as structured data), images (as unstructured data), etc. pose additional research challenges. The goal of this work is to address such information retrieval performance issues and challenges.; As XML is an evolving semi-structured data representation format, techniques for indexing and retrieval of XML data are drawing increasing attention. We have proposed a memory-efficient index structure and an efficient algorithm for incremental indexing of XML document collections. The experimental results show that our proposed index structure outperforms earlier schemes in terms of indexing time and storage requirements.; Given the growth in size of image collections over the last few years, Content-Based Image Retrieval (CBIR) systems are required to effectively and efficiently access images using information contained in them. Perception-based image retrieval, on the other hand, plays an important role in overcoming some of the semantic problems associated with CBIR. We have proposed a method that uses the concept of Inverse Image Frequency for perception-based color image quantization to improve traditional quantization schemes. Additionally, a cluster-based approach for efficient CBIR that uses a similarity-preserving space transformation method is proposed. Our results show that it offers superior response time with sufficiently high retrieval accuracy.; Lastly, for improving online analytical processing, our focus has been on the more challenging and evolving multidimensional data model. Earlier work does not completely address performance issues, such as query response time and view maintenance time, in data warehouses. We propose a hybrid approach for the selection of views that combines the improved response time of the static approach and the automated tuning capability of the dynamic approach. Experimental results show that the hybrid approach outperforms both the static and the dynamic approaches to view selection.; For future work, we suggest the integration of our results in these different areas and the evaluation of their applicability to real-life multimodal systems applications.

Keywords/Search Tags:

Retrieval, Data, Structured, Information, Performance, XML, Results

Related items

1	Ranked search over structured and semi-structured data
2	Performance Evaluation And Prediction Analysis Of Information Retrieval Systems
3	Research On The Effect Of Retrieval Results Presentation Mode On Retrieval Results
4	Research Of Information Retrieval Based Semi-Structured Data
5	Improvement of the results' relevance of a web information retrieval system using automatic query expansion
6	Research On And Implementation Of Chinese Structured Information Retrieval
7	Structure And Semi-structured Information Retrieval Related Technologies
8	Advancing information retrieval through databases, fusion and information extraction
9	Research On Personalized Meta Search Results Merging In Information Retrieval
10	Research On Information Retrieval Based On Language Model And Reranking For Retrieval Results