Predicting DataSpace Retrieval Using Probabilistic Hidden Information

Posted on:2013-03-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Gile Narcisse FANZOU TCHUISSAN

Full Text:PDF

GTID:1488304322450674

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Most organizations today rely on a large number of diverse data sources. The management of multiple data sources as a single source of information is a major challenge in the development of dataspace systems. A dataspace is a virtual space for managing heterogeneous data sources without considering their structure or location. We can summarize problems as follow:-Considering the large amount of data sources in the dataspace, current search systems concentrate their research only on the data sources. User's intention and user's relevance also need to be analyzed. In fact, analyzing user's query will eliminate many data sources in the retrieval process, hence, speed up the retrieval process.-Several current works on dataspace optimization focused their research on combining many predictive models and on building query plans. Unfortunately, query plans often cannot take advantage of pipelining because of limited buffer or CPU resources. Moreover, handling query plans during optimization needs large uphill moves. The dataspace retrieval process needs a parallel optimization approach to efficiently search for information on a set of distribute data sources.-The image search is still made possible only when the query is a set of keywords. Search for image by keywords is limited because keywords are not expressive enough to describe all important characteristics of an image. For example, an exact match request cannot be formulated in such systems and users should know the natural language. In fact, the keyword "lung cancer" in English has different spelling in French, in Chinese, or in German.This dissertation discusses the issues involved in the design of an information retrieval system for dataspace based on user relevance probabilistic schemes. Our main contributions are threefold: ?The query process starts by predicting the data sources that may contain the query result. Information Hidden Model (IHM) is constructed taking into account the user's perception of similarity between data. IHM uses a segmented query sequence list and a set of heterogeneous data sources, and then computes the most likely path to retrieve the efficient result. Three different learning strategies are proposed, namely UHH, UHB and UHS (User Hidden Habit, User Hidden Background, and User Hidden keyword Semantics).?A Two-Phase image Retrieval Optimization on dataspace using IHM is proposed.2PROM is a new algorithm, designed to optimize the dataspace retrieval process with two main phases. Its first phase consists of building a pipeline to find the best retrieval strategies. The second phase combines retrieval strategies with IHM to determine the most efficient way to execute a query.?An XML-based Image Retrieval System (XIRS) is defined to retrieve images in a single data source. XIRS is further generalized to XIRD, which could retrieve images in a set of distribute data sources. Users can search for information using a sample image file or keywords as a query. The similarity between two image files relies on the similarity between two XML nodes. This type of image search by similarity can be important in hospitals.To demonstrate the efficiency of the solutions proposed in this dissertation, we have grounded them into a particular system, the NoCancerSpace. This is a dataspace for lung cancer diagnosis search on the radiographic stereotypes.

Keywords/Search Tags:

Information Retrieval, Dataspace, Optimization Algorithm, Prob-abilistic Algorithm, Predictive Model

PDF Full Text Request

Related items

1	Research On Intelligent Predictive Control And Its Applications
2	Matrix Factorization In The Application Of Data Mining
3	Research On Optimization Strategy Of Distributed Model Predictive Control System Based On GA-PSO
4	A Stable Information Retrieval Algorithm And Its Application In Peer To Peer Network
5	Research Of Model Predictive Control And Operation Optimization Method In Process Industry
6	Research On Some Key Problems Of Data Integration In Dataspace
7	Study On Nonlinear Model And Predictive Control Based On Intelligent Algorithm
8	The Method And Application Of Nonlinear Model Predictive Control Based On Support Vector Regression
9	Study On Design And Application Of High Speed Model Predictive Control Algorithm
10	Research On Local Model Network Identification And Predictive Control