Font Size: a A A

Chinese Text Categorization Based On Multi-Instance Learning

Posted on:2013-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:C W LiuFull Text:PDF
GTID:2298330434475713Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technologies, Internet has entered the era of information explosion where vast amount of information has been increasing exponentially. Users want to be able to quickly and accurately obtained information which they concerned from vast amount of information. Therefore, automatic information processing becomes a research hot spot which is driven by users’ demand. Search engine, text classification, information filtering and other related technologies have been widely used.Natural language text is the major form of the vast amount information. Thus natural language processing is among the core techniques of massive information processing. This paper does some research about Chinese text automatic classification.There are no natural word segmentations in sentences of Chinese text and wrong segmentations can cause a great impact on the classification performance. Aiming at this problem, this paper put forward a novel Chinese text classification method based on multi-instance learning without word segmentation. The proposed method structures the multi-instance feature representation of an article by extracting a Chinese character and a certain number of characters followed in an article, then it uses MIRF classification method and multi-instance conversion classification method to perform classification tasks. Experiments on the corpus collected from an Internet bulletin and the tc-corpus-train corpus show that the proposed approach achieves relatively high classification accuracy without word segmentation and has practical value.
Keywords/Search Tags:text categorization, multi-instance learning, Chinese wordsegmentation
PDF Full Text Request
Related items