Font Size: a A A

Design And Implementation Of Web Automatic Text Categorization

Posted on:2010-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:L NieFull Text:PDF
GTID:2208360302970817Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, The information resource on the web that we can gain has covered various fields in our life. The engine complied with the current lives plays an important role in the network information retrieval. To deal with large-scale data involves data mining and knowledge, so the techniques of web mining and web information retrieval have been greatly developed. Automatic text classification plays an important role in the development of engine. Therefore we discussed technologies about the Chinese automatic classification, thus promoting the development of information technology.This paper generally discusses all kinds of techniques mentioned in text. Automatic classification crucial techniques had been studied deeply by the test. Study the methods of Automatic Chinese document segmentation deeply and based on the statistic algorithm bring forward an improved algorithm, which creates an estimated value to solute the problems of data sparsity. It can get high recall rate on the results of classification with the analysis on open-ended and close-ended tests. With the support of vector machine's methods, the design of text classification system was achieved. The classifier can be trained and tested through the training and testing exercise with the supervision of this system. As the result of experimental data, the system has a better effect of text classification with higher average value on accuracy, comprehensiveness and F value of the system.
Keywords/Search Tags:Text Classification, Chinese Word Segmentation, Feature Selection, Support Vector Machine
PDF Full Text Request
Related items