Font Size: a A A

A Study On Style Identification And Chinese Couplet Responses Oriented Computer Aided Poetry Composing

Posted on:2006-03-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YiFull Text:PDF
GTID:1118360155972585Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese poetry and Chinese couplets are among the treasures of traditional Chinese cultural heritage. In the information age, these treasures are facing the issue of being re-organized and analyzed through data mining. Currently, Chinese poetry and Chinese couplets need to be analyzed and intelligently simulated using information processing tools. Given the advancement of couplet database and machine learning technology, the problem of structuralized storage of terabit data and machine learning processing has been solved. The conditions required to solve issues related to traditional Chinese poetry under the frame of machine learning have been preliminarily provided. In the meantime, with the assistance of information technologies, traditional literature analysis has advanced to a higher level of processing concept from a new angel. With these improvements as well as machine assistance, complicated literature analysis and research of classical Chinese poetry has been made easier to understand and processing has been made more automated. This paper is an application-driven study conducted under the sponsorship of National Nature Science Foundation Project "Computer Aided Literary Arts Composing –classical poetry, Ci, Qu songs and couplets (Grant No. 60173060). The study is called "Machine evaluation of poetry styles and formation of couplet sentences". In the study conducted in this paper, poetry is expressed using vector space model. This paper presents a the computation model to differentiate bold and unconstrained style from graceful and restrained style of poetry for the first time based on methods such as Na?ve Bayes of machine learning. Additionally, this study improved the model using inheritance algorithm and thus achieved good results on poetry style evaluation. Furthermore, for the first time, this paper introduces classical poetry author differentiation computation model, which has been realized based on classical poetry vocabulary machine learning and achieved good poetry author evaluation results. Finally, this paper studies the formation of couplet phrase responses through the transformation of the issue of couplet response formation to the issue of sequential learning model formation. Through successful input of the upper couplet, machine formed response lower couplet. Many experiments were conducted based on real poetry couplet corpus data. Experimental results show The results of the experiments proved the feasibility and effectiveness of these methods. The main research outputs are as follows: 1) The methods and experiemental study introduced in this paper are all driven based on corpus database because of the needs of machine learning and digitalization of classical poetry. Therefore, the establishment of the database of Tang Poetry, Ci and couplet corpus are explained in Chapter 2. 2) The study was conducted focusing on the following three issues, aiming at the differentiation of highly abstract artistic conceptual styles. First, can the style of poetry be evaluated by machine? This is the issue of feasibility. Second, how to evaluate the style of poetry using machine, that is how to solve the issue. Third, how to enhance the effectiveness of machine evaluation, that is the issue of betterment. As to the first issue, it can be solved by taking the problem of poetry style evaluation as a form of patent reorganization. As to the second issue, this study explored the impact of rhythm and tone of poetry on the style of poetry and discovered that it is difficult to solve the poetry style evaluation problem using the aforementioned rhythm and tone. Thus, vector space model using words as the research target is utilized to express poetry and carry out the style of the poetry. As to the third issue, the evaluation results are improved by combining Chinese characters used for style evaluation and information addition selection. Additionally, the poetry style evaluation effectiveness is enhanced using inheritance algorithm by utilizing 55 Chinese characters to achieve 88.5% evaluation accuracy. This part is explained in Chapters 3 and 4 of this thesis. In the analysis of usage of words in the literature school, exploration analysis was conducted on poetry data of bold and unconstrained style as well as graceful and restrained style of poetry. HCA(Hiberarchy Cluster Analycis) and SOM(Solf organization Map) methods were used to explore the organization situation of connection phenomenon and character data based on the relationship of the characters appearing at the same time. MDS chart, HCA chart and SOM tree chart that can be visualized were separately obtained. The grouping results were analyzed. The study discovered the characters that commonly appear at the same time in the bold and unconstrained style as well as graceful and restrained style of Ci and the character usage style and characteristics of these two typical styles. Based on SOM grouping, the representative characters were selected as the criteria of SVM evaluation. Results of the experiment achieved 83% accuracy of distinction of bold and unconstrained style as well as graceful and restrained style of poetry. This part of content is explained in Chapter 5 of this paper.The distinction of poetry authors used a method similar to the one used for classical poetry style distinction. This method carried on an effective machine evaluation and conducted an exploration analysis on the data of poetry author using leveled grouping method. This study discovered the same present characters separately and commonly used in the poetry written by Li Bai and Du Fu as well as the character usage style characteristics of these two poets. This part of the content is covered in Chapters 3, 4 and 5 of this paper. Based on the analysis performed on traditional Chinese couplet characteristics, the study performed machine learning on the sequence study issue that treats the couplet words response formation as abstract surveillance sequential learning issue. In this issue, the upper and lower couplets of Chinese couplets can be seen as two language unit sequences of the same length. This study introduced the couplet word response formation computation model that has unlimited characters and separately utilized N-gram statistical language model sequential learning methods, HMM, and mistaken driven sequential learning methods based on conversion to conduct model formation analysis of couplet words formation. Additionally, computer programming was performed based on the established couplet corpus machine learning. Experimental results were achieved even for Chinese couplets for which language units are measured using only characters, forming computer couplet response system that has unlimited-length characters based on corpus. The best results relate to the upper couplet written for the celebration of the Divine Land Fifth manned vehicle launch ""九天揽月,华夏英豪驰宇宙;", the lower couplet automatically generated by the experiment system is "四海迎春,神州崛起舞天下。".This part of content is explained in Chapters 6 and 7 of the paper. The summaries and conclusions of the research work, as well as the suggestion for the further researches come at the end of the paper.
Keywords/Search Tags:NLP, Chinese couplet response, Style, Chinese poem, Machine Learning
PDF Full Text Request
Related items