| Text Mining is a new interdisciplinary field that combines the disciples of Data Mining, Machine Learning, Natural language understanding and text automatic processing techniques. It is the hot spot and the core technologies of Information Retrieval and the Data Mining domain research. Many researchers have been intensively conducted on it. The Qing Dynasty image database management system which undertaken by us is the National History of the Qing Dynasty Office key scientific research item, as a part of it,the categorization expert system is precisely based on the techniques of Text Mining.During construction of the categorization system, We has studied: the more effective and more accurate method of words segmentation; the model of the regular which can more advantageous to use; the more accuracy rule matching algorithms. Specifically, as follows:1.Discussed the Backward Maximum Matching algorithm, then in view of the characteristic of system processing object, proposed some improvement method.2. In view of the characteristic of the Qing Dynasty image naming, a new algorithm of rule induction has been proposed.3. Discussed several kinds of approximate string matching algorithm, pointed out their deficiency, then improves pair-wise method based on edit distance, the experimental result proved that the improved algorithm enhanced the accuracy of rule matching.This expert system has been developed by means of VB.NET based on SQL Sever2000. The capacities of knowledge storage, management and usage of the software have been enhanced by the help of SQL Sever2000. And the rapid development of software has been accomplished in the virtue of VB.NET. This categorization expert system passed the system test and obtained good result. |