Font Size: a A A

Research On Intelligent Classification Technique For Semi-structured Drug Data And System Implementation (Full-time Professional Degree)

Posted on:2012-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LuFull Text:PDF
GTID:2178330332998061Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information storage technology and communication, the amount of information of various industries increases quickly. The pharmaceutical industry with thousands of years of history has a huge amount of data especially. Automatic classification technology of information has become an indispensable tool to obtain useful information, text classification, mail classification, web page classification, etc., have achieved remarkable results. In order to achieve the intelligent management of drug data, and improve management efficiency, the subject designs intelligent classification system for semi-structured drug data.Through the study of various word segmentation and classification techniques, combined with the characteristics of drug data, this paper uses the thought of IK word segmentation and incremental learning-style naive Bayesian classification model to complete the development of this system. The main contents are as follows:The paper provides an overall architecture of the system at first, and gives a detailed description of each section, and gives a comparison of several kinds of common Chinese classification algorithms. Through the comparative research of various word segmentation methods, this paper combines with the characteristics of drug data, finishes the Chinese participle on the basis of IK participle thought, and describes the segmentation steps, the creation of dictionary and algorithm performance in detail.By in-depth analysis of feature selection algorithms, the paper have come up with an improved expected cross entropy algorithm, which take into account both the contribution of the irrelevant terms of the classification and the characteristics of the distribution of terms between classes. Experiments show that the improved feature selection algorithm has better results.This paper proposes an automatic incremental learning and artificial incremental learning incremental algorithm based on naive Bayesian, which deals with the lack of non-incremental learning for traditional Bayesian. And based on this, this paper combines with the weight of feature items in drug names. In this process, this paper describes the specific correction algorithm, including the amendment of the classifier and feature items set.The paper establishes an intelligent classification system for semi-structured drug data, including the application system and maintenance system, carries on the test on two systems based on a given test set, and provides the performance analysis. Experimental results show that the effect of the weighted incremental Bayesian is better.
Keywords/Search Tags:Drug categorization, Chinese word segmentation, Class confidence, Weighted incremental naive Bayes, Automatic incremental learning, Expected cross entropy
PDF Full Text Request
Related items