Font Size: a A A

Research On Mail Classification Based On Semantic Analysis

Posted on:2024-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z H GaoFull Text:PDF
GTID:2568307301455024Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the development of the unit,its actual benefits are getting better and better,so the unit has received more and more orders from upstream and downstream manufacturers in the market.The following problem is that more and more contracts are signed,and some contracts have longer legal benefits,ranging from a few years to even more than a decade,which requires timely management.The email includes not only contract emails,but also invoice emails,daily transaction notification emails,salary emails,etc.With the continuous increase of these emails,it is necessary to distinguish whether they need to be kept for a long time.For example,contract emails need to be kept for a long time,while other types of files need to be deleted regularly to avoid wasting storage space resources.Based on the above situation,this article proposes a semantic analysis based email classification method that is in line with the actual situation of the unit.This article mainly conducts research and work on the following aspects:1.Analyzed the advantages and disadvantages of commonly used email classification techniques,and combined these advantages and disadvantages with the problems encountered in this article to compare and design the optimal solution for this problem.2.In response to the impact of word segmentation results on classifier performance and the impact of feature value extraction on classification results,this article proposes an improved TF-IDF algorithm,called the MTF-IDF algorithm,which is the innovation of this article.During the experiment,the data was first preprocessed using methods such as Jieba word segmentation and TF-IDF feature value extraction;Secondly,I studied several relatively mature classification models,such as support vector machine model,naive Bayesian model,Fast Text model,and made a comparative experiment to study which classification model the innovative algorithm proposed in this paper can combine with to obtain higher accuracy and improve F1 value.3.Two sets of four control experiments were designed to compare the improved MTF-IDF algorithm for feature value extraction under different text numbers in the training and testing sets.The improved MTF-IDF algorithm was then applied to support vector machine models,naive Bayesian models,and Fast Text models for training.The results showed that the combination of the MTF-IDF algorithm and naive Bayesian models resulted in higher classification accuracy when the number of samples in the training set increased.
Keywords/Search Tags:mail classification, TF-IDF, naive Bayes, jieba, Support Vector Machine Model
PDF Full Text Request
Related items