Font Size: a A A

Design And Implementation On Email Address Clustering System Based On Data Mining Technology

Posted on:2008-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2178360242472333Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Now,the popular disposal methods of Email information mostly focus on analyzing and filtering of single Email content.But it's impossible to achieve classification Email by ruler and line just based on content. So how to use all kinds of successful technologies of data mining to find out valuable information from huge Email data becomes a problem that urgently to be resolved.The cluster analysis is the one of the important research of Data Mining.The function of cluster analysis is to group a set of physical or abstract objects into classes of similar objects.A cluster is a collection of data objects that similar to one another within the same cluster and are dissimilar to the objects in other clusters.This paper brings forward an Email address cluster system based on data mining technology.According to the receiving and sending's contact of Email addresses,system creates Email address's attribute of similarity measure,then use DBSCAN algorithm ,which is the one of density-based clustering methods,to classify Email by degree of Email address's contact, and find out the active Email addresses.The process minish the scope of Email address that should be examined.The pertinence and validity of Email analysis were improved.The process of extracting Email information implements information decoding and attribute storage by classes.By removing repetitive records,filling up blank records,eliminating superfluous records and traveling data sets, Email data is pretreated.The process of pretreating furthest curtails the data quantity.So it resolves the problem of time when disposing huge information.And the process does not destroy the data intrinsic charaters.Furthermore,by using the statistics of the Email's receiving and sending's quantity and the statistics of the Email contact status of given Email address,system can display visually the communication status of given Email address.It provides an intuitionistic means to analyse the rule of data and find out the survey of data.At the same time,it also provides the reference to validate the results aquired by clustering.Finally,the paper validates and analyses the system.The results of tests show that this system can run at an ideal speed,attain the goal of design to classify Email by degree of Email address's contact and display visually the results of the statistics of Email's information.The results also validate system's validity.
Keywords/Search Tags:Data Mining, Email, Clustering, Density
PDF Full Text Request
Related items