Font Size: a A A

Predicting DataSpace Retrieval Using Probabilistic Hidden Information

Posted on:2013-03-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Gile Narcisse FANZOU TCHUISSANFull Text:PDF
GTID:1488304322450674Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Most organizations today rely on a large number of diverse data sources. The management of multiple data sources as a single source of information is a major challenge in the development of dataspace systems. A dataspace is a virtual space for managing heterogeneous data sources without considering their structure or location. We can summarize problems as follow:-Considering the large amount of data sources in the dataspace, current search systems concentrate their research only on the data sources. User's intention and user's relevance also need to be analyzed. In fact, analyzing user's query will eliminate many data sources in the retrieval process, hence, speed up the retrieval process.-Several current works on dataspace optimization focused their research on combining many predictive models and on building query plans. Unfortunately, query plans often cannot take advantage of pipelining because of limited buffer or CPU resources. Moreover, handling query plans during optimization needs large uphill moves. The dataspace retrieval process needs a parallel optimization approach to efficiently search for information on a set of distribute data sources.-The image search is still made possible only when the query is a set of keywords. Search for image by keywords is limited because keywords are not expressive enough to describe all important characteristics of an image. For example, an exact match request cannot be formulated in such systems and users should know the natural language. In fact, the keyword "lung cancer" in English has different spelling in French, in Chinese, or in German.This dissertation discusses the issues involved in the design of an information retrieval system for dataspace based on user relevance probabilistic schemes. Our main contributions are threefold: ?The query process starts by predicting the data sources that may contain the query result. Information Hidden Model (IHM) is constructed taking into account the user's perception of similarity between data. IHM uses a segmented query sequence list and a set of heterogeneous data sources, and then computes the most likely path to retrieve the efficient result. Three different learning strategies are proposed, namely UHH, UHB and UHS (User Hidden Habit, User Hidden Background, and User Hidden keyword Semantics).?A Two-Phase image Retrieval Optimization on dataspace using IHM is proposed.2PROM is a new algorithm, designed to optimize the dataspace retrieval process with two main phases. Its first phase consists of building a pipeline to find the best retrieval strategies. The second phase combines retrieval strategies with IHM to determine the most efficient way to execute a query.?An XML-based Image Retrieval System (XIRS) is defined to retrieve images in a single data source. XIRS is further generalized to XIRD, which could retrieve images in a set of distribute data sources. Users can search for information using a sample image file or keywords as a query. The similarity between two image files relies on the similarity between two XML nodes. This type of image search by similarity can be important in hospitals.To demonstrate the efficiency of the solutions proposed in this dissertation, we have grounded them into a particular system, the NoCancerSpace. This is a dataspace for lung cancer diagnosis search on the radiographic stereotypes.
Keywords/Search Tags:Information Retrieval, Dataspace, Optimization Algorithm, Prob-abilistic Algorithm, Predictive Model
PDF Full Text Request
Related items