Font Size: a A A

Study Of Syntactic Analysis Method For Chinese Text Processing

Posted on:2007-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:G H ZhuFull Text:PDF
GTID:2178360182460554Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the extensive applications of computer and Internet, no matter the data processing and infonnation processing in the past or the knowledge processing on the upswing, they all demand the method of language processing and the requirements about the depth and extent of that are increasingly enhanced. The Chinese syntax parsing is an important problem of the domain of Chinese information processing, which can also promote the development of other related linguistics.As a natural language, Chinese has the most essential characters like the other natural languages, so it's necessary to make full use of and absorb mature linguistic theories for Chinese syntax parsing. Meanwhile, considering the particularity of Chinese, it's unwise to reuse the existed techniques, so it's wise and necessary to research and develop new and compatible Chinese syntax parsing from the point of the character of Chinese itself, conducted by the advanced linguistics theories.The kernel work of this article can be generalized to three aspects as follows:(1) Comparisons and synthesis are drawn from some existed algorithms and models about the syntax parsing. Based on the existed research theories, improvements on Chinese syntax parsing have given by adopting methods of both bottom up and top down, which have enhanced the efficiency and the precision of analysis.(2) A set of usual Chinese syntax library is designed as the research basis and a set of part of speech and phrases marker sets are confirmed according to actual applications. What's more, another syntax rule library based on context- irrespective are designed according to usual syntaxes.(3) One prototype system for Chinese syntax parsing is analyzed, designed and achieved, which is composed of three modules. They are participle module based on suited algorithm of maximal word length, part of speech marker module based on statistical method of training of relative frequency, and the syntax parsing module, which based on improved chart analysis algorithm respectively.The annotated corpus—PFR of People's Daily made by Institute of Computational linguistics (ICL) of Peking University is used as training samples in this article. According to the part of speech and phrases tagged sets and the syntax rule library mentioned above, the prototype system for Chinese syntax parsing is achieved in virtue of VC++6.0, with which small-scaleChinese text tests are carried out in order to validate the efficiency and feasibility of the improved algorithm mentioned above.
Keywords/Search Tags:Text processing, Chinese syntactic analysis, Chart algorithm, Syntactic parser, Syntax structure
PDF Full Text Request
Related items