Font Size: a A A

Study Of Text Classification Algorithm Based On Domain Knowledge

Posted on:2015-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q SunFull Text:PDF
GTID:2298330467963182Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Current various research institutions are increasing their efforts in the research and development of institutional repositories, and strive to integrate their own internal research resources and results, providing in-house and external researchers a more convenient way to obtain scientific information. An important research institution in the field of knowledge repository is divided, facing a wide range of large amounts of data resources, how to divide the field of efficient use of resources has been one of the core issues of institutional research repository, this study based on field Knowledge of text classification algorithm provides a feasible solution to this problem.Improvement of domain knowledge based text classification algorithm is based on the application of the more prominent effect stage Bayesian classification algorithm based on the conduct of this paper, the first to achieve the combination of Bayesian thinking Bayesian algorithm was elaborated then combined algorithm processes were designed text segmentation, feature selection and weighting improvement, a key step in the application of the classification of domain knowledge. Text which uses Chinese word segmentation method widely IkAnalyzer application stage and Lucene combination of feature selection and weighting for some of the major improvements in the special vocabulary of the document processing application domain knowledge section introduces auxiliary fields extended vocabulary Bayesian obtain a final calculation result.In the experimental section, we design a wide range of experimental steps were from the perspective of the algorithm accuracy and time algorithm for improved algorithms and comparative analysis of the original algorithm, experimental results show that the improved algorithm for text classification effectively improve the accuracy of classification and the system does not bring too much additional computation time. I believe improving ideological rationalization algorithm can provide reference for the division in the field of institutional repositories.
Keywords/Search Tags:Institutional Repository, Domain Knowledge, sortingalgorithms, Bayesian Classification, Text Classification
PDF Full Text Request
Related items