Font Size: a A A

Research On XML Information Retrieval

Posted on:2013-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L WenFull Text:PDF
GTID:1268330395987572Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid spread of XML technology, XML has become the standard formatfor data representation and data exchange on the Web. There are a huge number ofXML documents in many domains. It becomes a hot research topic that how toretrieve XML data efficiently and effectively among database and informationretrieval research communities. There are rich solutions in unstructured data retrievalwith traditional information retrieval techniques. But XML data is semi-structuredwith both content and structure, and brings new challenges to information retrievalresearch. It becomes a novel research idea that XML data is retrieval with databaseand information retrieval.This paper analyzes research status of XML information retrieval, considerssolutions with database and information retrieval, and addresses some crucialproblems which are related with XML data retrieval, include XML keyword search,XML content and structure search with vagued structure context, and XML full textsearch based on relational database. The main contributions and innovations include:This paper proposes an approach of keyword search over XML documentsbased on Candidate Fragment semantic. This method first filters candidatenodes according to number of descendants and attribute type numbers ofXML tree nodes, and then constructs candidate fragments centered fromcandidate nodes. After indexing these candidate fragments by inverted list,this method answer user queries with candidate fragments or candidatefragments with ancestor-descendant relationship which satisfy all keywordsand adapt the characteristic of XML dataset. Experiments show thatCandidate Fragment semantic can provide users compact, meaningful andproper size results and have good performance on XML keyword search.This paper proposes an approach to retrieval XML data with vague structuralcontext. We processes user query and XML documents as structural term set. Context resemblance is computed based on level weight of element incontext, level similarity between elements of longest matched context, andother factors. We extends Vector Space Model to answer XML content andstructure search. Experiments show that our method has good performanceon XML content and structure search.This paper proposes an approach of XML full-text search method based onrelational database, named as ReXFT. ReXFT maps XML data into relationalstorage based on NXRel, and can naturally reflect the logical model of XMLdata. ReXFT allows users to create XML full text index on user defined pathsbased on full text element nodes. W3C Recommendation is adopted inReXFT to submit user XML full text search to fit the international standards.ReXFT scores search results based on cover density ranking schema, takinginto account the logical relationship between search terms, distance,frequency and other factors. Experimental results show that ReXFT has goodperformance in the processing of XML full-text search.
Keywords/Search Tags:Keyword Search, Content and Structure Search, Structural Context, Full Text Search, Cover Density
PDF Full Text Request
Related items