Font Size: a A A

A Email Mining System Based On Email Communication Network And Content Analysis

Posted on:2017-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y J YueFull Text:PDF
GTID:2348330512452052Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Email is an important communication method because it's a reliable, fast and convenient way for users. Users send and receive email intentionally, and which maps to the relationship of each others. The theme and the content of the mail contains rich user's information. Through email data mining, we can create email communication network using relationships between email addresses, mining from email subject and content. In order to find important members and suspicious mail network, it is necessary to make full use of email communications and content. As the number of emails continue to increase, in large and complex communication network, it is inefficient to use the method based on the traditional serial algorithm and artificial participation.The demand for huge amounts of data mining platform is very urgentThis thesis is based on email communication relationship and mail content of mining technology, researched the on E-mail mining under the huge amounts of data parallel algorithm,Implementation a cloud platform based on communication relations and the content.In this thesis, the main work includes as follows:(1)On the basis of fully considering the needs of the business, to the principle of generality, scalability and efficiency, Established a cloud platform architecture for data mining on E-mail data, designed the system module and the user interface;(2)Constructed the cloud computing environment based on open source Hadoop distributed file system and Map/Reduce parallel computing environment;(3)Designed a set of parallel data mining services based on Map/Reudce,providing mining services to establish VSM, create connected graph,calculatie content similarity and content clustering;(4)Implemented the application logic and user interface based on JavaEE, Established mail mining system based on relationship analysis and communication content mining.This system is based on cloud-computing, using the parallel data mining services as the core, building email discover model, providing efficient data mining services and assistant analysis method for analysts.It is convenient to adapt to the growing number of mail processing requirements through flexible expansion of computing and storage resources.There is great improvement of the efficiency of analysis and email data utilization.The system is running stablly and reliablly since it is deployed.Goals expected are achieved.
Keywords/Search Tags:Data Mining, Email Mining, Hadoop
PDF Full Text Request
Related items