Font Size: a A A

An Approach Of Querying Unstructured Data In Dataspace

Posted on:2015-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:X M LiFull Text:PDF
GTID:2348330518470400Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science technology, data presents a lot of features, including the massive data amount, the heterogeneous data schema and the complex data association. Traditional data management and service technology cannot meet the new challenge. Dataspace, as a new data management technology, can meet the challenge with the characteristics of pay-as-you-go integration method and loose data schema.There is growing number of applications that require access to both structured and unstructured data. One of the key services of Dataspace is to provide seamless querying on the structured and unstructured data. Querying each kind of data in isolation has been the main subject of study for the fields of databases and information retrieval. Recently the database community has studied the problem of answering keyword queries on structured data such as relational data or XML data. The only combination that has not been fully explored is answering structured queries on unstructured data.This paper does the research on query-transformation technology and explores an approach of answering structured queries on unstructured data. This kind of approach constructs a keyword query from a given structured query, and submits the query to the underlying engine for querying unstructured data. we first define the query-graph and its construction, translate the given structured query into the corresponding query-graph, the query graph captures the essence of the query and removes irrelevant syntactic symbols .then we proposed i-scores updating algorithm and label selection algorithm to extract keyword from the query-graph, we base our selection on the informativeness and representativeness of a label,at last,we propose several directions that we can explore to improve keyword extraction when domain knowledge exists, and validate the impact of values and query length on query results. The experimental results show that our algorithm works fairly well for a large number of datasets from various domains. And the query-graph-base approach that we propose achieves higher precision compare to several other approaches.
Keywords/Search Tags:Dataspace, Unstructured data, Query-transformation, Keyword extraction
PDF Full Text Request
Related items