Font Size: a A A

Research On Some Key Techniques Of Document Database

Posted on:2005-05-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D LiuFull Text:PDF
GTID:1118360125967576Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the wake of information era, Internet is prevalent very fatA. With the explosion in the availability of text information, document database research gains more and more attention from both database community and information retrieval arena, for its promising application in many fields, such as digital library, office automation, software engineering, and publishing industry etc. Document database is a database system which store and manage mass structural documents. It not only provides expression, organization, storing, access for document, but also can process text mining and generate abstract from documents.This thesis addresses several key technical problems of document database, which covers full-text indexing, structural document retrieval, text filtering, text mining, and issues concerning document database system implementation. Major contributions of this thesis include:1) Research of Inter-Relevant Successive Tree ModelWe develop a new full-text index model, Inter-Relevant Successive Tree(IRST), basing on E 2 adjoining matrix. The model exploits order and redundancy of character sequence, and adapt to mass full-text storing and indexing. IRST is still a multifunctional model. For example, it can be a tool for text sequence mining.2) Research of XML document from IR viewA new XML document retrieval model based on structure similarity is put forward, which evaluate the similarity of document structure and query path. A prototype is implemented to test the performance of the XML document retrieval model. The experimental results show that the model improves the retrievaleffectiveness.3) Research of text filtering based on semantic analysisText filtering technology based on statistics usually is ineffective when it deals with polarity text. The method overlooks the semantic restriction of text, so it isn'tgood for identifying polarity information. The paper provides a new text filtering method b ased o n s emantic a nalysis, w hich t akes i nto a ccount s emantic r elations i n text. The experiment results indicate that the method efficiently recognizes and heads off the polarity information.4) Research of text sequence mining based on IRSTIRST efficiently stores the order of text, and can directly provide the support of text sequence. We exploit the feature of IRST for text sequence mining. As the same to the FP-tree, it need not generate candidate. And it scans transaction base only once, which is better than FP-tree.
Keywords/Search Tags:Document Database, Full-text Indexing, Structural Document Retrieval, XML retrieval, Text Filtering, Text Mining, IRST
PDF Full Text Request
Related items