Font Size: a A A

On improving information retrieval performance from structured, semi-structured and un-structured information sources

Posted on:2006-05-24Degree:Ph.DType:Dissertation
University:University of Louisiana at LafayetteCandidate:Shah, Biren NFull Text:PDF
GTID:1458390008452113Subject:Computer Science
Abstract/Summary:
The field of unstructured data retrieval for simple data types such as text and structured data retrieval in relational data models for transactional processing has already been well researched and commercially developed. However, more complex data types and models such as XML (as semi-structured data), data warehouses (as structured data), images (as unstructured data), etc. pose additional research challenges. The goal of this work is to address such information retrieval performance issues and challenges.; As XML is an evolving semi-structured data representation format, techniques for indexing and retrieval of XML data are drawing increasing attention. We have proposed a memory-efficient index structure and an efficient algorithm for incremental indexing of XML document collections. The experimental results show that our proposed index structure outperforms earlier schemes in terms of indexing time and storage requirements.; Given the growth in size of image collections over the last few years, Content-Based Image Retrieval (CBIR) systems are required to effectively and efficiently access images using information contained in them. Perception-based image retrieval, on the other hand, plays an important role in overcoming some of the semantic problems associated with CBIR. We have proposed a method that uses the concept of Inverse Image Frequency for perception-based color image quantization to improve traditional quantization schemes. Additionally, a cluster-based approach for efficient CBIR that uses a similarity-preserving space transformation method is proposed. Our results show that it offers superior response time with sufficiently high retrieval accuracy.; Lastly, for improving online analytical processing, our focus has been on the more challenging and evolving multidimensional data model. Earlier work does not completely address performance issues, such as query response time and view maintenance time, in data warehouses. We propose a hybrid approach for the selection of views that combines the improved response time of the static approach and the automated tuning capability of the dynamic approach. Experimental results show that the hybrid approach outperforms both the static and the dynamic approaches to view selection.; For future work, we suggest the integration of our results in these different areas and the evaluation of their applicability to real-life multimodal systems applications.
Keywords/Search Tags:Retrieval, Data, Structured, Information, Performance, XML, Results
Related items