Font Size: a A A

Semantic search and information retrieval techniques for text repositories

Posted on:2013-12-10Degree:Ph.DType:Dissertation
University:University of Arkansas at Little RockCandidate:Singh, Lisham LekhendroFull Text:PDF
GTID:1458390008473872Subject:Engineering
Abstract/Summary:
Most organizations keep details of events as electronic documents. Repositories of such documents collected over the time are often used for various purposes, including critical decision making. These documents are generally used to garner information related to a specific event, or for trending or predictive analysis. This could be done manually by going through one document at a time, provided the size of repository is small and there is no time limit for the task. However, in most practical scenarios, the repository size is huge and available time is limited. Such documents typically have multiple fields, each carrying certain information, and that may complicate the task. Moreover, most domain specific data contain symbols, terminologies with definitions given in external dictionaries or ontologies further complicate the task. Therefore, powerful and efficient ways of finding relevant information are required to effectively use these repositories. Finding relevant information from text data is studied under various information retrieval and text mining techniques.;This dissertation work studies information retrieval and text mining for large text data, and propose new text mining solutions for different types of dataset. The new solutions range from traditional text mining to newer semantic search paradigms.;Solutions under traditional text mining focus on providing seamless and flexible search techniques for different datasets, where a dataset is a collection of documents that share single meta-data information. Numerous related works have been investigated and new methods to achieve the goal of seamless information retrieval from a given dataset as well as across multiple datasets are introduced.;Solutions under newer semantic approach concentrate on faceted paradigm, which allows users to explore large data space in flexible manner. Two key tasks in developing a faceted application are identifying facets and constructing their hierarchies. After exploring related research works, especially on how these two key tasks are performed, this dissertation proposes new algorithms of these tasks for structured datasets, which consist of narrative and non-narrative fields. Moreover, techniques for designing a faceted retrieval for completely unstructured data are also explored in this research.;This dissertation is an important step towards semantic search for large text datasets. Methods for identifying facets and constructing their hierarchies for non-narrative and narratives fields of a structured dataset are important. This is because, structured datasets with non-narrative and narratives fields are complex, and faceted search, which has already become the de facto standard for e-commerce applications, approach for such structured datasets would not only give semantic search on its non-narrative fields, but also enjoy semantic leverages of narrative text.
Keywords/Search Tags:Text, Semantic search, Information retrieval, Structured datasets, Techniques, Documents, Fields, Non-narrative
Related items