How to exploit computers in processing natural language to analysis, synthesis and translate content of natural language materials, is very important both in theory and in practice. It becomes more important in the Internet era.In NLP (Natural Language Processing), the statistical model based on large amount of real corpus evolves rapidly and presents good performance. Consequently, the construction of corpus bank becomes the basic work of NLP.In this thesis, we discuss the general process of archaic Chinese history corpus construction:the choosing of corpus, the selection of coding, the purification on character layer and the purification on sentence partitioning layer, etc. We present the general algorithms in the processes from web document to the clear and sentence-partitioned primary corpus. Besides these, we discuss the design of the query function on corpus, and present the design and implementation of some key algorithms and data structures. On the basis of this work, we developed a set of applications for corpus construction and constructed the Comprehensive Mirror(《资治通鉴》)corpus。... |