Font Size: a A A

Research Of Multiple Emails Automatic Summarization

Posted on:2009-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:B X WangFull Text:PDF
GTID:2178360278464454Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, more and more Internet users have become the beneficiaries of e-mail service. At the same time the e-mail information in the Internet space is also showing a growing trend. As an accustomed communication tool, some e-mails maybe contain lots of confidential information, which belongs to the state, enterprises and individuals. E-mail content security technology directly related to the country's political stability, the enterprise's data security and vital interests of the individual, is of great practical significance. In such a context, this paper presents the research on the email content oriented automatic summarization.Multi-email automatic summarization extracts important or user-interesting information according to emails related to a given topic, and automatically generates a length-fixed summary. A feasible multi-email automatic summarization system is of great help for the monitors to promote the speed and precision of email information processing. In this paper we present and construct a multi-email automatic summarization system based on the retrieval results of the massive email. We mainly focus on the following issues:Firstly, this paper presents a user query-oriented improved extracting method by considering the application environment and the difference between the email content and normal texts. By using this method, the summarization system meets the effectiveness of the system and real-time demand to a certain extent.Secondly, this paper solves the problem of summary sentence extracting by using the maximal marginal relevance model, in order to reduce the redundancy of the summaries while keeping high precision. Based on this, we have done a deep study on the effects made by the sentence relevance calculation and the linear interpolation factor upon the MMR model. Furthermore we present a HowNet based sentence similarity calculating method and a self-adaptive factor choosing model to improve the performance of the summarization system. The intrinsic evaluation shows that the improved system achieves a higher summary quality. Finally, in this paper we have done some researches on a series of other relevant technologies. On the access of the email information, we have implemented the automatic email parsing and content decoding. The problem that the useless information existing in the email content may do adverse effects to the summary results, we have proposed the concept of email content noise and use a rule-based way to remove it. On the high-speed Chinese segment, this paper presents how to apply the Trie tree structure to build a segment dictionary automatically and to search any word fast, as a result of which the response time of the system has been cut down significantly.
Keywords/Search Tags:multi-email automatic summarization, maximal marginal relevance, sentence similarity calculation, Trie
PDF Full Text Request
Related items