Font Size: a A A

Natural Language Processing Applications In Drug Patent Search System

Posted on:2005-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:X J ChengFull Text:PDF
GTID:2191360122497350Subject:Applied Chemistry
Abstract/Summary:PDF Full Text Request
With the increasing application of computer in the field of chemistry, especially the development of artificial intelligence, more and more expert systems have been designed to handle the structural information of organic chemistry. The problem of how to handling chemical information with computers needs some kind of structural representation that can be accepted by most of chemists. The representation should be unique and unambiguous. Generic structures are widely used in the chemical patents. The generic structures and the description of variations make the retrieval of patent information difficult and complex. It seems necessary to translate the text information into the partial structural information automatically.In this paper, the technique of Natural Language Processing is applied to translate the text information of variation in the patent abstract into the normal and unique codes that can be discriminating by computers. The designed Generic Structure Compact Connectivity Table (GSCCT) can be generated by the combination of structure information input and the results of translation of text. The match and retrieval of generic structures is based on the GSCCT. The translation system is built by the technique of object-oriented program so that the sentences and syntax in the patent abstract be analyzed and a kind of language model constructed. The algorithm of Chinese automatic words segmentation is presented to help simplify the structure of database and improve the precision of translation. The database is open ended so that any new and normal lemma can be added. The translation system is an expert system with the ability of studying and extending.The translation system is an auxiliary tool integrated into the retrieval system of pharmaceutical patent information and used in the representation, store and search structure information by computer. It also helps to decrease the tedious work of indexing and increase the efficiency greatly.The whole translation system is tested through over two hundred patent abstracts and the result meets our expectation.
Keywords/Search Tags:Natural language processing, Machine Translation, Chinese automatic words segmentation, Maximum Matching Method
PDF Full Text Request
Related items