Research And Implementation Of Classification Algorithm Based On Message Content And User Behavior Relationship

Posted on:2017-10-22

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Song

Full Text:PDF

GTID:2348330485486053

Subject:Computer software and theory

Abstract/Summary:

Email is more and more important in human communication. While it brings convenience to people, also makes people have to spend a lot of time to deal with a lot of mails. With the popularity of e-mail, people have to spend more and more resources of human and financial on dealing with e-mails. Therefore, to construct a new effective email classification algorithm becomes particularly urgent.The research on the paper focuses on the problem of the mail classification. Imbalanced data sets is the key point of the problem. In recent years,the classification of the imbalance date set is a popular issue.Imbalanced data sets is refers to the different categories of data in a huge number of differences. In the process of classification, unbalanced data gathering caused classifier in favor of the categories with more number. For the categories with small number what we pay more attention,the classifier didn’t work well. At present there are two popular solutions: changing the data distribution and adjustment of the classification algorithm. Combining the two methods, this paper proposed a multilevel classifier algorithm. This algorithm combines E-mail content and user behavior relationship. The algorithm filtered by themselves, continuously reduce the imbalance of sample in the final stage finally realize the relative balance of data. In addition, the current e-mail classification algorithm is generally for the e-mail content, ignoring the role of the e-mail address in the mail classification, in fact, the same message sent to us by different people, since the relationship between the sender and the recipient, these Mail will be treated differently. Therefore, in this paper, full consideration of the e-mail address information, combined with user behavior and the relationship between the content of the message classification.In the implementation process of the algorithm, I used a lot of traditional classification machine learning algorithm, such as the proportion of naive Bayes, support vector machines, random forests algorithm and so on. Training on the use of e-mail address generated classification model, based on the combination of multi-level e-mail message content classification implements the imbalance mail classification, and achieved relatively good results.

Keywords/Search Tags:

email classification, Unbalanced data, Multi-level classifier, Confidence, Random forest, SVM, naive Bayes

Related items

1	Improvement Of Navies Bayes Text Classification Algorithm Based On Unbalanced Dataset
2	Research On Unbalanced Text Data Set Classification Algorithm
3	Design And Implementation Of Multi-classifier Based On Information Classification System
4	Research On Improved Naive Bayes Classification Model For Imbalanced E-commerce Review Text
5	Research And Design Of Web Classification Algorithm Based On Education Browser
6	Prediction Of Protein Contact Map Based On Weighted Naive Bayes Classifier And Extreme Random Tree
7	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
8	Research On Business Email Classification Based On User Knowledge
9	Research On Optimization Of Random Forest Algorithm And Its Application In Text Parallel Classification
10	The Research Of Classification Based On Rough Sets And Naive Bayes