Font Size: a A A

Research On Intelligent Processing Technology Of Ancient Book

Posted on:2008-06-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:E ChangFull Text:PDF
GTID:1118360242465853Subject:History of science and technology
Abstract/Summary:PDF Full Text Request
Digitization of ancient books in our country started in 1980s, and up to now has already got great achievements. Many important and basic ancient books were developed into real digital products, and went to market successfully. There are many research papers, and most of them discuss present condition, development trend and the strategy for digitization of ancient books, and introduce the achievements and digitalization technology. But a few of them study on the intelligent processing technology. Now the ancient book digitalization works concentrate on the developing of digital products, more about digitalization technology, and little about intelligent processing technology.Along development of digitization of ancient books in-depth, more and more researchers think that they are not only the duplication of original ancient books, but the unification of modern technology and traditional contents. Digital ancient books should be incremental knowledge data base and availably tools of collation and academic research of ancient books. It is the important work of ancient book digitalization that we should offer scientific and accurate statistical information, relevant reference data and assisted tools about the ancient books content in order to increase the research function of digital ancient book, and construct a complete expert system of ancient book arrangement.This paper is focus on the digitization of agriculture ancient books, and analyses digital technology in-depth. We analyze the key digital techniques by the numbers, discuss the modernization of collation method, and study on automatic compilation and automatic comparison and analysis of agricultural ancient books. The main research works are as follows:(1) The article introduces and analysis the relevance technology of ancient book digitalization systematically. The key for the digitization is to input the original text fast, realize digitization of text character, design hyperlink for the browsing and reading, develop powerful information retrieval system, and research on value in use. Digitization of agricultural ancient books has the same problem. This paper dissertate the correctional digital techniques, including character set, process and storage, browse and reading, information retrieval and intelligent processing technology.The article analyze the windows character set and Chinese character using, then point out that we should choose Unicode character set and the complex Chinese character to input. Furthermore, the problems of diversion between complex and simplified Chinese character, the missing of words are discussed. Finally, the settle scheme is present. This paper introduces how to scan and use OCR identified of ancient books. We think that the data format of the text and image must be unified and storage in database, thus it is of advantage to collective construction and sharing of the information resource. The hyperlink is the real superiority in browse and reading of digitization. This paper analyzes the hyperlinks being contained within ancient books, and discusses how to label them in the database. Digitization has a heavy effect on research of collation of ancient books, and the biggest change is the method of literature retrieval. This paper compares three kinds of retrieval manner of digitized ancient books, then draw a conclusion that digitization of ancient books should construct the information retrieval system based on full-text retrieval including such retrieval technique as keyword search, condition search, logic search, illegibility search, and attribute search, etc.(2) The article tries to apply information process in Chinese into research on the automatic compilation of agricultural ancient book. We give the principle of the automatic compilation of agricultural ancient book, design its process in detail, and explore its technology and algorithm in-depth.The automatic compilation of agricultural ancient book is that people find and excerpt co-relational data from agricultural ancient book by using computer, and then anthologize. The automatic compilation of agricultural ancient book includes such technology as automatic Chinese word segmentation, text partition, passage retrieval and automatic clustering. Its fundamental is that the relation of two clauses is more near if there are more same words in them, and then we partition text into paragraphs based on the relation of the clauses. The paragraph will be extracted which contains the compilation subject. The main excerpt steps as follow: firstly the chapter of the agricultural ancient books is segmented into the same size clauses, secondly the keyword of the clauses are drawn by using maximum matching method of word segmentation, thirdly the cohesive and the depth scores of the clauses are calculated by using the block comparison method, and lastly we compute the averageμand standard deviation a of the depth, then select such separation dots as the partition which depth scores exceedμ-c*σ. The system uses dynamic automatic clustering technology to show the results of compilation for giving convenience to users. We add annotation of words and phrases to the compilation results using hyperlink technology by construct database of agriculture history words, thus can make the results easy to read.(3) Textual criticism is very important part of ancient book arrangement. If computer technology is used, the efficiency of textual criticism will be improved greatly. The paper designs the algorithm of the automatic version comparison and analysis of agricultural ancient book, and studies on some associated problems.The automatic version comparison and analysis of agricultural ancient book is that the difference among different versions of agricultural ancient book is automatic found and marked down by computer, and give help to critic by the assisted tools. This paper designs the algorithm named window way matching technology, basis on the techniques of mode matching and automatic proofreading for Chinese texts. Its fundamental is that the two character strings are extracted from two different versions of the book, and compare one string with another. If they are not equal to each other, then they are segmented into substring and compared again. The added, deleted and replaced characters in one string compared with another are judged as derivate, omissions and errors, respectively. If they are equal to each other, then we extract another two character string and repeat the forenamed steps. The automatic version comparison and analysis system ought to identify the difference and make a simple judgement, or give help to critic by the assisted tools. So it is very important to construct the assisted tools, including the list of ancient official title, personal name and place name, the lexicon of words or phrases to be avoided as taboo, variant words list, the complex and simplified Chinese character list, etc. The article analyzes the construct methods of every assisted tool. Furthermore, the paper studies on the method of the automatic version comparison and analysis by analyzing citation literature in the ancient books.(4) Develop the experimental system of intelligent process of agricultural ancient books is also the important part of our research. The system is mainly composed of automatic compilation subsystem, automatic version comparison and analysis subsystem, and the assisted tools subsystem. Except for the tools abovementioned, the assisted tools subsystem contains such wordlist as the chronological record of events of previous dynasties in China, the king record of previous dynasties in China, and the index of title of resign of previous dynasties in China, etc. The paper address the general design and realization of the three sub-systems in detail, including data collection, function structures, and etc. Then we test the result of automatic compilation and automatic version comparison and analysis separately.We test the results of the automatic compilation by giving the score manually. The good results of the automatic compilation account for 72.2 percent of the total, and we are satisfied with this. We were satisfied with the result. At the same time, we used the quantitative method to test the results of the automatic version comparison and analysis. The testing result is that the recall is 92.3% and the precision is 95.2%. So, we can see that the algorithm named windows way matching is feasible. There is also some shortage in the system, and we need to improve the automatic efficiency, strengthen the construction of the assisted tools and make the system function take a step forward.The article applies relevance technology of information process in Chinese in digitization of agricultural ancient books, and its creativities contain several following:(1) The article researches and realizes the automatic compilation of agricultural ancient books by using automatic segmentation, section and chapter partition, and passage retrieval. It makes a great progress in automatic and intelligent processing of digitization of agricultural ancient books, even to the whole ancient books.(2) The article researches and realizes the automatic compilation of agricultural ancient books by using automatic segmentation, section and chapter partition, and passage retrieval. We design the model of the automatic compilation, analysis and study the key technology in-depth, such as the technology of the topic sentences excerpting and the results automatic clustering. It makes a great progress in automatic and intelligent processing of digitization of agricultural ancient books, even to the whole ancient books.(3) It is the great technology breakthrough of ancient books criticism and consolidation that we design and realize the automatic version comparison and analysis by reference to Chinese text automatic proofread and model matching. The article gives the principle of the automatic version comparison and analysis, designs the algorithm named window way matching technology, and analysis the construct method of the assisted tools. Except for the automatic compilation and automatic version comparison and analysis of ancient book, the satisfactory expert system of ancient book collation ought to contain such functions as automatic search error, automatic segment sentence and punctuation, automatic annotation and automatic translation, etc. Because of constipation of our research time and condition, we only research on automatic compilation and automatic version comparison and analysis of agricultural ancient book, and preliminary study on automatic annotation. So, there is a gap exits between our experimental system and the expert system of ancient book collation. It will be the striving direction of our work to sophisticate and ameliorate the experimental system.
Keywords/Search Tags:agriculture ancient books, digitization of ancient books, automatic compilation, automatic version comparison and analysis, expert system of ancient book arrangement
PDF Full Text Request
Related items