Font Size: a A A

Research On The Method Of The Xml-based Full-text Retrival And The Implement Of The Prototype System

Posted on:2010-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2178360275479723Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the development of the society's informationization, people have entered the information age, and is moving into the knowledge age. In this background, information has become increasingly prominent to the enterprises or other organizations, and become the key to their existence and development. However, due to the lack of the effective management of their information, enterprises or organizations can't obtain the information which they badly need in time. The reason why they can't manage their information effectively is that the ways they use to manage their information have many defects. For instance, Database system, one of the ways used to manage the information by enterprise or organizations, if the amount becoming very big with the accumulation of information, its performance will be a sharp decline. The other way used to manage their information is to use the retrieval function of the large-scale search engine which also has many defects, for example, indexing inefficiently, index updating slowly, being inaccurate in data, inability to control the output format and so on. In view of the problems mentioned above, this paper presents an XML-based full-text retrieval method to achieve the purpose for the enterprises or organizations to effectively manage their information. In particular, the major work of this paper includes the following:(1) The paper has researched and analysed the current two mian types of the technology of the full-text retrieval, to provide the basises and supports for the the optimization of the system.(2) The paper has researched the technologies of XML deeply. And fully considering the flexibility of the tags of XML and the semantic information contained by the tags themselves, When indexing, we take into account not only how to find relevant information from documents, but also take into account the structure and granularity of the documents to achieve the content + structure information retrieval. And using XML as a common data interface, we change other formats data sources (such as database resources, PDF resources, word files, etc.) into XML format documents. Through this way, we can increase the speed of indexing and reduce the storage space. There are two reasons to account of it. First, the standardization of the storage can avoid the separate storage of all the searched documents; second, to deposit all kinds of information into single XML format document can shorten the time required by the index program when positioning, opening and closing the files, which is evident when the amount of data is very large.(3) Based on deeply analysing the platform (Lucene) for the implement of the Full-text retrieval system, we have improved and optimized the platform Lucene, for instance, the optimization of the Analysis Module, the adjustment of relevant parameters, and so on.(4) We design and implement XML-based full text retrieval prototype system. The system consists of different sub-modules which are different in function. Each sub-module is connected with interface, which makes the model have a loose framework for the system. This makes the dependence of each sub- module lower, thus will be advantageous to the later revision and promotion, and will be allowed to facilely integrated into other system.In a word, the results of this study can laid a solid foundation for the establishment of an efficient, accurate and practical XML-based retrieval system, and provide ways and means for the enterprises or organizations to manage their information effectively.
Keywords/Search Tags:XML, Lucene, ICTCLAS, Full-Text Retrieval System
PDF Full Text Request
Related items