Study On Data Mining Technique Of Pharmaceutical Patents

Posted on:2008-12-14

Degree:Master

Type:Thesis

Country:China

Candidate:J Liang

Full Text:PDF

GTID:2189360218455174

Subject:Physical chemistry

Abstract/Summary:

PDF Full Text Request

Pharmaceutical patents have become one of the most importance information widely usedin many fileds, especially in innovative drug design. However, our techniques of storage andretrieval of patent information by computers are far behind those developed countries. Manycommecial pharmaceutical patent databases have been built up in several countries, e.g.,British, U.S.A and French. And we have attanched importance to it in recent years. A copy ofpharmaceutical patent is different from other kinds of patents due to its contents consistingboth generic structures and corresponding descriptive texts. In this paper, the advanced datamining techniques are applied to handle the text information in order to facilitate the retrievalof patent information.I first improve StruDraw, one of chemical software designed specifically for genericstructure input and output in our group. The function of translating text into chemicalstructure may be helpful to those front-end users who have little chemical background toindex chemical structures directly and easily. It is worthy to mention that the software waswritten in C++ and its component-based architecture makes it easy to add new functions witha little modification.As text categorization, the first step of storing a chemical patent by computer is to classifythe patent to which kind it belongs to. Data mining, or machine learning algorithms are morecompetitive to those traditional manual methods. The applications of several machine learningmethods to the categorization of pharmaceutical patents are presented in this paper. About2000 pieces of pharmaceutical patents are categorized into five classes according to theircurative effects and are selected as training instances. Features in text form are first extractedfrom each class and then are expressed in numerical vector form. Three machine learningalgorithms, i.e., Support Vector Machines, Na(?)ve Bayes and RBF Neutral Network are testedby 5 or 10 folds corss validation methods. Their performaces are compared by a series ofexperiments. And results show SVM algorithms outperforms than the other two algorithms.Methods proposed in this paper maybe helpful to the pharmaceutical patent categorization.

Keywords/Search Tags:

Data Mining, Machine learning, Pharmaceutical Patent, Text Categarizating, Translation from Character to Structure

PDF Full Text Request

Related items

1	Research Of Online P2P Lending Behavior Based On Text Mining
2	Text Mining And Comprehensive Ranking Of Hotels Based On Hotel Reviews
3	Research On Risk Public Opinion Monitor Technology Of E-Commerce Product Quality Based On Data Mining
4	Study On Methods Of Data Mining And Text Mining Based On Fuzzy Logic And Neural Network
5	Research On International Flight Price Forecasting Model Based On Big Data
6	Research On Enterprise Bankruptcy Prediction Based On Data Mining Technology
7	Study On The Influence Of Web Reviews On 5A-Class History And Culture Spots’ Incomes Based On Text Mining
8	Demonstration Study Of Customer Churn Prediction Based On Data Mining
9	Research On Predication Of Internet Financial Returns Based On The PSO-LSSVR
10	Research On The Characteristic Analysis Of Human Resource Management Post Competency And The Construction Of Competence-Post Matching Model Driven By Big Data