Font Size: a A A

Research And Development Of Automatic Indexing System Of Biology Literature

Posted on:2007-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2178360182978244Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The 21st century is the life science century. And as it is developed rapidly, it plays an important role in the whole science. The researchers have achieved many great results. Along with the development of internet, to make full use of and search efficiently magnanimous results, it is more strict to search literature exactly and entirely. In order to search the required literature in the biology literature of tremendous amount, it is very necessary to build a biology literature database which is classified more clearly and simply and is searched more conveniently and quickly. And indexing literature accurately can improve the recall and accuracy. Due to the large amount of work load, low efficiency, and difficulty standardization in the process of manual indexing of biology literature, it is a very important and urgent task to develop automatic indexing.Nowadays, , the rapid exchange of the digitalized biology literature without misunderstands on the web is greatly restricted, this has become the bottleneck of realizing the indexing of the biology literature. The purpose of this thesis is to apply the improved MM algorithm to the automatic word segmentation of biology literature. Through using the matching method arithmetic in positive direction and converse direction, the automatic word segmentation of biology literature is achieved. Thus, the automatic indexing system for the biology literature based on the dictionary is realized. The main contributions of this paper are as follows:Based on the subject word resources in the field of biology literature, the procedure of constructing the word tables of biology literature is proposed, and the method of constructing the word tables of biology literature is realized. The method constructs biology stopped word table, biology special word table and biologykeyword table and biology statistical table in the field of biology literature on one hand. On the other hand, it uses tables to express the dictionary. The method based on dictionary makes the procedure of the manual indexing of the biology literature more simply, releases the involvement of field experts, and is good for the automatic indexing.After analyzing the characteristic of the subject word in the field of biology literature, an automatic index mode among the biology literatures based on the improved MM algorithm is put forward. The system presented in this thesis adopts biology word segmentation dictionary, and inherits the word-based system as well. Therefore, the subsystem benefits from the advantage of the improved MM algorithm automatic word segmentation. Thus, the enhancement of the biology literature recall in the subsystem is realized.Based on biology literature data digging, word count can reflect biology research rule and recognize new subject word which can consummate the biology word dictionary. To some degree , it does help to enhance the biology literature recallBased on the indexing of biology literature database department of Shanghai Information Center for Life Science Chinese Academy of Sciences, by analyzing the main steps of the indexing of biology literature, the three function modules of automatic indexing, word count and word management is designed. The automatic indexing system of CBA is realized in the special field. Finally, a detailed realizing plan and process for the automatic index system is produced.Overall, the method of automatic indexing based on dictionary is introduced into the field of biology literature comprehensively and systematically. Through designing and constructing the biology literature word segmentation dictionary, the automatic indexing system for the biology literature based on the dictionary is established. This sets a good example to the method of automatic indexing based on dictionary in the field of biology literature. The idea and method in this thesis will be helpful to solve problems in knowledge management systems in other fields.
Keywords/Search Tags:Automatic Indexing, Subject Indexing, Automatic Word Segmentation, Biology Subject Word Table, Biology Literature
PDF Full Text Request
Related items