| Along with the development of Internet, the Email already becomes one of the important tools, which people use to exchange daily information. It has greatly facilitated people's life way. But simultaneously spam mail also unceasing increase, occupying server's massive storage space. And it has seriously disturbed people's daily communication. How to control the spam mail has become the topic, which calls for the people attention. Many scholars and the scientific researchers devote to the mail filtering technology research. The spam mail filtering technology, based on the nonnegative matrix factorization, has been promoted well to text classification domains and data mining area.The nonnegative matrix factorization is one new method of term dimensionality reduction, and it is based on semantic level in term dimensionality reduction process. Comparing with the classical term dimensionality reduction method, it carries on the term dimensionality reduction process on the semantics cluster level. This may eliminate a characteristic word to be equivocal, multi-word synonymy phenomenon; Thus in the text classification process, the category distinction accuracy is higher. Because the nonnegative matrix factorization is based on the semantic in dimensionality reduction process, therefore the dimensionality reduction effect is very remarkable. The efficiency of algorithm realization is much quicker. The experiment result also confirmed this point, explained that the nonnegative matrix factorization in the text classification domain has the important theory and the application value.The classical dimensionality reduction methods for the text classification are analyzed and these dimensionality reduction method application backgrounds are discussed.We introduced a new term dimensionality reduction method, nonnegative matrix factorization method. We carried on the contrastive analysis with the classical dimensionality reduction method. Comparing with the classical dimensionality reduction method, the performance of nonnegative matrix factorization method is better, because it is based on the semantic in dimensionality reduction. It has solved the synonym and in turn the equivocal phenomena. Finally we designed and realized the spam mail filtering demo system.The experiment result proved that the nonnegative matrix factorization dimensionality reduction method is more effective than the traditional method. The mail filter efficiency is higher, and the spam mail feature term update is more convenient. |