Cross Language Text Categorization (CLTC) is the task of assigning class labels to documents written in a target language (e.g. Chinese) while the system is trained using labeled examples in a source language (e.g. English). In this thesis, we study two key problems of CLTC.The first problem is the language barrier between the source and target languages. To solve this problem, we propose the Cross Language K-Nearest Neighbors (CLKNN) algorithm which performs Cross Language Text Categorization (CLTC) from the perspective of Information Retrieval. The only external resource required by CLKNN is a bilingual dictionary. Experimental results show that our method gives promising performance, which is better than translation-based method.The second problem for CLTC is the topic drift between languages, which causes the classifier trained on the source language doesn't perform well on the target language. To solve this problem, we propose an active learning algorithm for CLTC. Our algorithm makes use of both labeled data in the source language and unlabeled data in the target language. The classifier learns the classification knowledge from the source language, and then learns the cultural dependent knowledge from the target language. In addition, we extend our algorithm to double viewed form by considering the source and target language as two views of the classification problem. Experiments show that our algorithm can effectively improve the cross language classification performance. |