Font Size: a A A

Research And Realization Of Data Archiving And Information Retrieval System

Posted on:2013-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2248330371490739Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information construction, the information system will produce huge volumes of historical data. Then, problems, such as lack of storage capacity, low efficiency of information retrieval, difficult to mine the potential value will restrict the information development. Therefore, it is so important to research and develop a system about data archiving and information retrieval to store and manage historical data.According to the needs of the data archiving and information retrieval system, this thesis presents a cross-application archiving data storage and classification retrieval program, take the LUCENE full text retrieval technology to solve classification retrieval of historical data, use the XML technology to solve the data exchange between applications, and base on J2EE multi-layer Technique, this thesis divides the system into five functional modules to complete the overall design of the data archiving and information retrieval system; The five parts are data archiving, system management, user management, reconstruction management and information retrieval. During the system development process, the technology used by the thesis can divided into information retrieval technology and XML technology. The information retrieval technology mainly introduce the index maintenance and management strategies, retrieval query strategy and results sorting strategy, and deeply analyses the document inverted mechanism and three kinds of Chinese word segmentation strategies (based on the dictionary, word frequency statistics, the verbal meaning of Chinese word segmentation strategy). The XML technology mainly gives the analysis of the two model mapping strategies:edge model mapping strategy and node model mapping strategy, and summarizes pros and cons of features in different application scenarios.Firstly, the thesis use the Xparent model for structured data mode conversion to improve the data communication capabilities, which in line with the node model mapping strategy and the edge model mapping strategy. For these unstructured data, the thesis use the document parsing technology to parse the unstructured data into structured data. Secondly, in order to make up the inadequacy of the Chinese word segmentation strategy, this paper use the ICTCLAS Chinese sub-word system instead of LUCENE’S Chinese sub-word system to optimize the generation and maintenance of the index. finally, according to the document invert mechanism and combined with the document similarity sorting strategy to achieve improve the precision of the data, the purpose of recall rate.Operation of the system running results demonstrate that the operation and maintenance costs were lower, and the recall ratio and precision ratio communication ability of the historical data are accord with the enterprise indicators. Therefore, the solution proposed by the paper is feasible and has a good application prospect.
Keywords/Search Tags:data archiving, information retrieval, lucene, xml technique, relational database schema
PDF Full Text Request
Related items