Semantic search and information retrieval techniques for text repositories

Posted on:2013-12-10

Degree:Ph.D

Type:Dissertation

University:University of Arkansas at Little Rock

Candidate:Singh, Lisham Lekhendro

Full Text:PDF

GTID:1458390008473872

Subject:Engineering

Abstract/Summary:

Most organizations keep details of events as electronic documents. Repositories of such documents collected over the time are often used for various purposes, including critical decision making. These documents are generally used to garner information related to a specific event, or for trending or predictive analysis. This could be done manually by going through one document at a time, provided the size of repository is small and there is no time limit for the task. However, in most practical scenarios, the repository size is huge and available time is limited. Such documents typically have multiple fields, each carrying certain information, and that may complicate the task. Moreover, most domain specific data contain symbols, terminologies with definitions given in external dictionaries or ontologies further complicate the task. Therefore, powerful and efficient ways of finding relevant information are required to effectively use these repositories. Finding relevant information from text data is studied under various information retrieval and text mining techniques.;This dissertation work studies information retrieval and text mining for large text data, and propose new text mining solutions for different types of dataset. The new solutions range from traditional text mining to newer semantic search paradigms.;Solutions under traditional text mining focus on providing seamless and flexible search techniques for different datasets, where a dataset is a collection of documents that share single meta-data information. Numerous related works have been investigated and new methods to achieve the goal of seamless information retrieval from a given dataset as well as across multiple datasets are introduced.;Solutions under newer semantic approach concentrate on faceted paradigm, which allows users to explore large data space in flexible manner. Two key tasks in developing a faceted application are identifying facets and constructing their hierarchies. After exploring related research works, especially on how these two key tasks are performed, this dissertation proposes new algorithms of these tasks for structured datasets, which consist of narrative and non-narrative fields. Moreover, techniques for designing a faceted retrieval for completely unstructured data are also explored in this research.;This dissertation is an important step towards semantic search for large text datasets. Methods for identifying facets and constructing their hierarchies for non-narrative and narratives fields of a structured dataset are important. This is because, structured datasets with non-narrative and narratives fields are complex, and faceted search, which has already become the de facto standard for e-commerce applications, approach for such structured datasets would not only give semantic search on its non-narrative fields, but also enjoy semantic leverages of narrative text.

Keywords/Search Tags:

Text, Semantic search, Information retrieval, Structured datasets, Techniques, Documents, Fields, Non-narrative

Related items

1	Semantic Based Information Retrieval From Semi-structured Documents
2	Research On Kernel Techniques In Intelligent Search Engines And Its Implementation
3	Research Of Information Retrieval Based Semi-Structured Data
4	Research On Techniques Of Text Retrieval Modelbased On Semantic Analysis
5	Areas Of The Theme-based Web Information Retrieval Techniques
6	Research On Entity Retrieval Based On Terms And Categories Information
7	The Research And Application Of Enterprises Documents Search Engines Based On Lucene
8	Text Search Techniques And Optimization Strategies On Hybrid Data
9	Semantic Retrieval Of Semi-Structured Text Based On Ontology Concept
10	Structure And Semi-structured Information Retrieval Related Technologies