Font Size: a A A

Primary Research On Archaic Chinese History Corpus Construction

Posted on:2012-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:W R SongFull Text:PDF
GTID:2268330425997275Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
How to exploit computers in processing natural language to analysis, synthesis and translate content of natural language materials, is very important both in theory and in practice. It becomes more important in the Internet era.In NLP (Natural Language Processing), the statistical model based on large amount of real corpus evolves rapidly and presents good performance. Consequently, the construction of corpus bank becomes the basic work of NLP.In this thesis, we discuss the general process of archaic Chinese history corpus construction:the choosing of corpus, the selection of coding, the purification on character layer and the purification on sentence partitioning layer, etc. We present the general algorithms in the processes from web document to the clear and sentence-partitioned primary corpus. Besides these, we discuss the design of the query function on corpus, and present the design and implementation of some key algorithms and data structures. On the basis of this work, we developed a set of applications for corpus construction and constructed the Comprehensive Mirror(《资治通鉴》)corpus。...
Keywords/Search Tags:Corpus, Corpus construction, Archaic Chinese, History, Query
PDF Full Text Request
Related items