Font Size: a A A

The Exception Information Discovery And Mining For Mass Mailing Logs

Posted on:2019-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y BuFull Text:PDF
GTID:2428330590965719Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
E-mail has become an indispensable information transmission medium in people's life and work in spite of its convenience,rapidity,and low cost,which has led to the proliferation of spam.Especially in universities,this problem is particularly serious.Eliminating the spread of spam and effectively filtering spam in the mail system are the central focus of the school and even the corporate network center.Nowadays there are many related researches,such as commonly used Bayesian,support vector machine filtering algorithms,and also obtained a lot of results,but most of them are based on the content filtering algorithm.In reality,the text content of the mail cannot be obtained because of privacy issues,and the content-based filtering also consumes a lot of processing time.Therefore,new methods and algorithms must be sought.This article takes the mail system of a university as an example,proposes an information processing architecture based on ELK to the mail log which can process a large amount of log flow data in real time.In reality,the mail system in universities generates up to hundreds of millions of mail logs every day.The framework still performs well for hundreds of millions of throughput.Then use regular expressions for log analysis,extract the information needed for the experiment in the log,such as the sender,recipient,sending time,acceptance time and other elements,and define the concept of the mail event to model the resulting element combination.Deposit into the graph database.Then proposes the concept of user basic behavior mode unit(abbreviated as mode unit),improve the user behavior pattern mining algorithm,extract the user behavior features,and discover the abnormalities of the mail information by analyzing the characteristics in the snapshot.Experiments show that this scheme can realize the real-time processing and modeling storage of large-scale quantitative data and meet the needs of the mail system.The main task of this paper is to provide a new tool combination ELK+Neo4j to process mail logs and use ELK to achieve real-time search of mail logs.The method of regular expression is used to extract the scattered information in the mail log,combine the piecemeal information modeling,propose the concept of mail event,and store the event in the graph database Neo4 j.Introduced the concept of schema elements,improved the user behavior pattern algorithm to detect user anomaly information,and found spam.
Keywords/Search Tags:log analysis, real-time processing, graph database, ELK, mode element
PDF Full Text Request
Related items