Research And Implementation Of Retrieval System On Massive Mail

Posted on:2009-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:X Shi

Full Text:PDF

GTID:2178360278464410

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Along with the development of computer and network, e-mail, as an important application of Internet, is welcomed by the people with its convenience and the rapid speed. The individuals, enterprises, government and even the military, are communicating via e-mail for daily life and work. However, the illegal businesses and lawless elements using e-mail push ads, viruses, unhealthy and undermine national stability information, makes potential safety hazard for the individuals, enterprises and the nationality. The mail filtering is the mature technology to filter the spam, but it can't prevent the propagation of the negative information. It has become a research direction how to retrieval the sensitive information in the massive documents and trace the suspicious information and users. So there is a urgent need to manage and monitor the massive mail safely.This paper analyzes the characteristics and special format of the mail, and the retrieval system of massive mail. The text of mail content, from, to which the user is interested in can be easily searched by the system, so it can solve the monitoring of mail message effectively. In order to improving the efficiency of processing mass-mail, the distributed mail parse, indexing and searching are mainly studied. Firstly, after introducing the theory of the special mail document, this paper analyzes the mail format and proposes a VSM (Vector Space Model) for the mail document. Secondly, the traditional inverted index document is used store the indices. The incremental index, which is different from normal retrieval system, is implemented in this retrieval system. This method saves index update time highly. For speeding up processing the massive mail data, the distributed processing technology is adopted in the system architecture. When pre-process the mail, the distributed processing technology makes one task run on several nodes by implementing the distributed algorithm, leading to the high speed of parsing and indexing. It also makes the search process stable and rapidly. Finally this paper describes the data test, analyzing the parsing and indexing speed between monolithic and the distributed system. Also it draws a conclusion that the search time is depending on the mail scale and the complexity of the query.The series of user operations such as parsing mail, indexing, searching are implemented in this system, combined with the distributed parallel technology. The system uses the invert index to store and manage the mail indices. And in order to meet the demands of the user's query, the similarity is computed by the mail VSM. At the same time, a good computing capability and application development environment is supplied for the system's unified interface and method.

Keywords/Search Tags:

mass-mail, Distributed processing, information retrieval, index

PDF Full Text Request

Related items

1	The Design And Implementation Of A Distributed Storage And Retrieval System
2	Research On The Distributed Indexing Platform And Information Filter In Distributed Full-text Retrieval System
3	Research On Patent Information Retrieval Based On Distributed Multi-index Fusion
4	Research On Fast Text Retrieval Methods And Optimization Of Engineering Realization
5	The Impact of a Targeted Training Program on E-Mail System Processing Capabilities and Self-Perception of E-Mail Overload
6	The Effects of Index Storage on Ranked Information Retrieval
7	Research On P2P Search Technology In Uncooperative Environments
8	Design And Implementation Of The Bulk Direct Mail Centralized Processing System
9	Research On Key Technology Of Distributed Full-Text Index For Web Information
10	Key Problems Research On Distributed Information Retrieval