Research On Intelligent Classification Technique For Semi-structured Drug Data And System Implementation (Full-time Professional Degree)

Posted on:2012-09-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Lu

Full Text:PDF

GTID:2178330332998061

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of information storage technology and communication, the amount of information of various industries increases quickly. The pharmaceutical industry with thousands of years of history has a huge amount of data especially. Automatic classification technology of information has become an indispensable tool to obtain useful information, text classification, mail classification, web page classification, etc., have achieved remarkable results. In order to achieve the intelligent management of drug data, and improve management efficiency, the subject designs intelligent classification system for semi-structured drug data.Through the study of various word segmentation and classification techniques, combined with the characteristics of drug data, this paper uses the thought of IK word segmentation and incremental learning-style naive Bayesian classification model to complete the development of this system. The main contents are as follows:The paper provides an overall architecture of the system at first, and gives a detailed description of each section, and gives a comparison of several kinds of common Chinese classification algorithms. Through the comparative research of various word segmentation methods, this paper combines with the characteristics of drug data, finishes the Chinese participle on the basis of IK participle thought, and describes the segmentation steps, the creation of dictionary and algorithm performance in detail.By in-depth analysis of feature selection algorithms, the paper have come up with an improved expected cross entropy algorithm, which take into account both the contribution of the irrelevant terms of the classification and the characteristics of the distribution of terms between classes. Experiments show that the improved feature selection algorithm has better results.This paper proposes an automatic incremental learning and artificial incremental learning incremental algorithm based on naive Bayesian, which deals with the lack of non-incremental learning for traditional Bayesian. And based on this, this paper combines with the weight of feature items in drug names. In this process, this paper describes the specific correction algorithm, including the amendment of the classifier and feature items set.The paper establishes an intelligent classification system for semi-structured drug data, including the application system and maintenance system, carries on the test on two systems based on a given test set, and provides the performance analysis. Experimental results show that the effect of the weighted incremental Bayesian is better.

Keywords/Search Tags:

Drug categorization, Chinese word segmentation, Class confidence, Weighted incremental naive Bayes, Automatic incremental learning, Expected cross entropy

PDF Full Text Request

Related items

1	Incremental Learning Of Naive Bayes Chinese Classification System
2	Incremental Learning Based On Neural Networks Ensemble
3	An Incremental-styled Learning Chinese Word Segmentation System Based On Perceptron Algorithm Design And Implementation
4	The Study Of Chinese Text Categorization Based On Concept
5	Chinese Text Data Classification
6	Research On Incremental Learning Method Based On Bayes Theory
7	Research On Catastrophic Forgetting In Class-Incremental Continual Learning Model
8	Chinese WEB Document Automatic Categorization
9	Case-based Domain Adaptation Incremental Learning Methods
10	Application Of Incremental Learning In Radar HRRP Target Recognition