Font Size: a A A

Research And Implementation Of Information System For Mass And Real-Time Data

Posted on:2005-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:L S NiFull Text:PDF
GTID:2168360152967143Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of information technology, information retrieval technics is becoming more and more developed and being widely used. On the other hand, as the absence of effective ways to maintain massive information in Internet, people have to face bad information flooding every day. For a better supporting against this challenge, the thesis implements a real-time massive information retrieval system which can work in Internet by improving GONIA—a distributed IR existed already. GONIA was developed by CERNET eastern China(north) center for general purpose. As a prototype system being moved to real-time massive environment, its performance improving is the bottle neck amount all other issues. For reaching this target, the thesis focus its study on refining data storage strategies and kernel algorithm of GONIA by the ways of statistics, data mining, double buffer, single link and cluster algorithm.The first chapter expatiates the correlative background of this thesis. The thesis aims at the management of the network information with the information technics and better and healthier improvement of the information technology.The second chapter tells GONIA system which is the basis of this thesis. The kernel technique of GONIA system is very robust and mature, but disagrees with the mass and real-time data environment unfortunately.The third chapter mainly discusses the improvement of GONIA system. Firstly the adjustable double buffer is applied to this system to smooth the randomicity of the entered data, so that this system can collect the real-time data. Secondly for querying in the mass data, two-phase vector cluster algorithm creates one cluster tree as the index of the query module. The two-phase cluster algorithm is the magnificent idea of this thesis. By the way the junk mail takes a large part of Internet data, for which this system consists of one import application: the module of filtering group mail. The module can greatly save the time and the space of this system, and also it can cooperate with the cluster module. The fourth chapter implements this system. The system adopts new data transmission and data distributionon the basis of GONIA system architecture. Then all modules of this system are discussed. As we know, the main work of this system stays in the collect module and the cluster module, which are subsequently discussed. In addition for improving the performance of the system much more, one lockless queue memory management mechanism is applied into the system.The fifth chapter tests the collect performance and group mail discrimination rate. All of these prove that the system can work well in mass real-time data environment.
Keywords/Search Tags:Internet, Real-time Massive IR, Cluster, Group Mail
PDF Full Text Request
Related items