Font Size: a A A

The Analysis On The Basic Techniques For Preprocess Of Text Mining And The Study On The Application Of Text Mining

Posted on:2009-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:D J SunFull Text:PDF
GTID:1118360245969465Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The general workflow of text mining has been systematically explained and implemented in this thesis. The key techniques used in text mining including collecting text, preprocess of text, automatic Chinese word segmentation for the processed documents ,selecting training pattern and reducing support vectors, text training and text mining. We divide the system into four parts based on analysis of the system's requirement: text collecting and preprocess, Chinese word segmentation, selecting training pattern vector and the training and classification of the text patterns vector.Unlike the general text mining, we need to collect test, preprocess these text and save the weight of the text. We implement a preemptive multi-thread web text collector. It collects the text of special catalog using Depth First Algorithm. And we implement a text preprocessor to erase the Tag and set the weight for the web Text by using recursive match method. On the other parts, we first introduce a classifier using the nexus between words and type to properly select training pattern and to reduce support vectors. And then we introduce the basic theory about K nearest neighbor (KNN) , the application of KNN in text classification and the software KNN. The extracted patterns and their weight are used to form the input file, through which we can implement text training and text classification.The author implement the text collector and preprocessor and the Chinese word segmentation machine for text mining, propose a new solution for selecting the text patterns and text mining based on our study.
Keywords/Search Tags:Chinese word segmentation, Vector Space Machine(VSM), K nearest neighbor (KNN), Text Mining
PDF Full Text Request
Related items