Research On Some Key Techniques Of Document Database

Posted on:2005-05-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y D Liu

Full Text:PDF

GTID:1118360125967576

Subject:Computer software and theory

Abstract/Summary:

In the wake of information era, Internet is prevalent very fatA. With the explosion in the availability of text information, document database research gains more and more attention from both database community and information retrieval arena, for its promising application in many fields, such as digital library, office automation, software engineering, and publishing industry etc. Document database is a database system which store and manage mass structural documents. It not only provides expression, organization, storing, access for document, but also can process text mining and generate abstract from documents.This thesis addresses several key technical problems of document database, which covers full-text indexing, structural document retrieval, text filtering, text mining, and issues concerning document database system implementation. Major contributions of this thesis include:1) Research of Inter-Relevant Successive Tree ModelWe develop a new full-text index model, Inter-Relevant Successive Tree(IRST), basing on E 2 adjoining matrix. The model exploits order and redundancy of character sequence, and adapt to mass full-text storing and indexing. IRST is still a multifunctional model. For example, it can be a tool for text sequence mining.2) Research of XML document from IR viewA new XML document retrieval model based on structure similarity is put forward, which evaluate the similarity of document structure and query path. A prototype is implemented to test the performance of the XML document retrieval model. The experimental results show that the model improves the retrievaleffectiveness.3) Research of text filtering based on semantic analysisText filtering technology based on statistics usually is ineffective when it deals with polarity text. The method overlooks the semantic restriction of text, so it isn'tgood for identifying polarity information. The paper provides a new text filtering method b ased o n s emantic a nalysis, w hich t akes i nto a ccount s emantic r elations i n text. The experiment results indicate that the method efficiently recognizes and heads off the polarity information.4) Research of text sequence mining based on IRSTIRST efficiently stores the order of text, and can directly provide the support of text sequence. We exploit the feature of IRST for text sequence mining. As the same to the FP-tree, it need not generate candidate. And it scans transaction base only once, which is better than FP-tree.

Keywords/Search Tags:

Document Database, Full-text Indexing, Structural Document Retrieval, XML retrieval, Text Filtering, Text Mining, IRST

Related items

1	Design And Implementation Of A Document-oriented Full-text Retrieval System
2	The Research And Implementation Of Full-text Retrieval System Based On Lucene
3	Multi-document Full-text Retrieval System Design And Implementation
4	Design And Development Of Multi-source Document Full-text Retrieval System Based On Lucene
5	Research On Self-indexing Algorithms For Highly Repetitive Document Collections Based On FM-index
6	Full-Text Search Technology Research And Application In "2008 Olympic Games" Multi-Language System
7	Multi-document Retrieval System Design And Development
8	Research And Implementation Of Distribute Massive Text Data Index And Retrieval System
9	Research And Implementation Of Electronic Document Full-text Retrieval System Based On Lucene
10	Design And Implementation Of Enterprise Knowledge Document Retrieval Management System