Font Size: a A A

The Spam Identification Based On Conditional Random Field

Posted on:2017-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:S W DingFull Text:PDF
GTID:2428330590491669Subject:Statistics
Abstract/Summary:PDF Full Text Request
Conditional Random Field(CRF)was first proposed to solve the labeling problems,just like the hidden Markov model and the maximum entropy model.While the hidden Markov model has always been criticized for the rigorous independent hypothesis and the maximum entropy model has the label bias problem,the Conditional Random Field has turned out to be best model for labeling problems.There are several advantages applying the CRF into spam classification.Apply CRF processing mail classification,we need to take the mail and mail category as the observation and state sequence separately.Then construct the potential functions using the knowledge of text processing fields to extract relative features.This paper extends the first-order chain CRF to CRF with skip-chain,the potential functions contain more nodes that are relative to each other.Because we can build CRF between each mail and the given mail category,then calculate the object conditional probability.So we make more assumption: the different mails be independent or could be taken as a Markov chain,thus we can give the joint probability of the samples.The final step is parameter evaluation and model prediction.
Keywords/Search Tags:Conditional Random field, Skip-Chain, Spam, Classification
PDF Full Text Request
Related items