| The aim of this thesis is to improve accuracy of Bayesian spam filtering, the most popular and widely used approach in spam filtering. Among the various possible approaches to this aim, two approaches that improved the filtering performances are presented in this thesis. Three popular evolutions of Bayesian spam filtering algorithms: Naive Bayes, Paul Graham's and Gary Robinson's are reviewed. Formulated on top of those evolutions, proposed algorithms incorporate new novel ideas.The first approach proposed is co-weighting of multiple probability estimations. Though based on Bayesian theorem, several ways of computing probability estimations have been proposed and used. Those estimations are examined and a new, combined, more effective estimation based on co-weighted multi-estimations is proposed. The approach is compared with individual estimations.The second approach is based on co-weighted multi-area information. Bayesian spam niters, in general, compute probability estimations for tokens either without considering the email areas of occurrences except the body or treating the same token occurred in different areas as different tokens. However, in reality the same token occurring in different areas are inter-related and the relation too could play role in the classification. This novel idea is incorporated, co-relating multi-area information by co-weighting them and obtaining more effective combined integrated probability estimations for tokens. It is shown that this approach also improves the performance of spam filtering. The new approach is compared with individual area-wise estimations and traditional separate estimations in all areas. |