Font Size: a A A

Research Of Information Retrieval Based Semi-Structured Data

Posted on:2006-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:J L WangFull Text:PDF
GTID:2178360185963879Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recent research has focused on data with irregular, unknown, or frequently changing structure; such data is called semi-structured data. Such data is generally modeled as labeled graphs, and several query languages based on above models have been proposed. This gives us flexibility to process any data, and we can deal with changes in the data' s structure seamlessly. On the other hand, it also has some drawbacks: data is inefficient to store, since the schema needs to be replicated with each data item; queries are hard to evaluate efficiently, because of the additional processing of the replicated schema: even a simple regular path expression may require the entire graph to be traversed.In this paper, we present a method capable of reorganizing semi-structured data according to the regularity of its structure. The storage model, combining relations and graph data, provides an effective way for exploiting regularities in the data' s structure. The paper gives an algorithm for generating storage models of the method. We also give an algorithm for translating queries on semi-structured data to operations against relations in the storage model. The subsequent execution could be finished by a query evaluation component in relational systems.This paper analyses the shortcomings of search engines, and then points out distributed retrieval structure based web sites. This paper also analyses some algorithm of text categotization. The methods of feature selection and weight adjustment techniques are discussed and analyzed, and their influence on text classification precision and efficiency is pointed out. We introduced a new weight function, which includes feature weight evaluation function and adjusts the effect of the feature term in the classifier according to the feature term' s strength.
Keywords/Search Tags:Semi-Structured Data, Storage Model, Text Retrieval, Text Categorization, Weight Adjustment Techniques, Feature Evaluation Function
PDF Full Text Request
Related items