The popularity of computers and the Internet brings people rich resource for daily life andwork. At the same time, the explosive information and data had overâ€flooded us. With the text inthe Internet growing at an exponential rate, manual categorization has been out of date andreplaced by automatic classification.A number of algorithms have been proposed after decades of research. For Chinese textclassification, various approaches and systems have been developed, which can achieverelatively high classification accuracy. More specifically, in text pre-processing stage, thedivision of ambiguous phrase is a key factor that affects the accuracy. And it has not beenproperly solved. This paper presents a background-based iterative framework integrated with themutual information theory. It is used into the data preprocessing to improve the traditional textclassification algorithm which is based on the Na ve Bayesian model.Data from various Sina categories are used for the experimental evaluation of the framework.The results show that the proposed background learning based iterative framework for textclassification is feasible and effective. |