Font Size: a A A

Classical Chinese Poetry Generation Research Based On Neural Networks

Posted on:2022-04-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:J XuanFull Text:PDF
GTID:1525306737490284Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classical Chinese Poetry is one of the greatest cultural heritage,with more than3000 years in China,whose glamour lies in the everlasting aesthetic value and its literature contributions.Poetry Generation relies on natural language processing,and machines learn language rules through real texts,thus creating readable language outputs.These rule-based natural language processing methods have solid language theory foundations and can easily yield to fluent and easy to read texts yet lack adequate semantic concerns.Then context related texts are generated from Hidden Markov Model.The improved model can calculate word occurance probabilities within texts and learn language collocation patterns through traning,but still inefficient in finding hidden connotaions between words.In recent years,language generation tasks favor deep learning techniques with end-to-end models.The encoding and decoding process insures a sequenced input-output order with Recurrent Neural Networks(RNN)and a better comprehension of the given language data.This paper recalls the orientation of Neural Network Poetry Genaration and carries out a recurrent Variational Autoencoder(VAE)practice to create machine poems.Further more,the author proposes a Self-Attention Mechenism based machine learning poetry generating model---Delicate Writer(DW)poem maker model.Bearing the aim of inheriting human poem creation skills to bring computers four composition steps:opening,developing,changing and concluding.Hopefully,our poem maker DW can effectively add machine models certain human poets’ writing mechanics such as imagery,metaphor,keywords expansion,etc.The content of this research is to provide a Classical Chinese Poetry big data corpus,to add a new CNN(Convolutional Neural Network)method for poetry style recognition,to generate poems in a Self-Attention Recurrent VAE model,and to make assessments on machine poems by both human judgement and computer auto-scoring with a GAN network.As a post study of Computational Arts Creation by National Natural Science Foundation of China,the author carries out an interdiscipline research with a background of an English major,from the perspectives of linguistics.We list major contributions of the thesis as follows:(1)Corpus Construction of Classical Chinese PoetryAs a unuique cultural heritage,Classical Chinese Poetry is the pearl in Chinese literature.In our research,we trace all possible recordings in anticent books,websites,and existing corpura;then proofread each line when collecting the pieces into our corpus.Ever since Shi Jing until contemporary Classical Chinese Poems written by modern poets,we have sorted all 296,770 poems with a total number of 19,260,616 Chinese characters.Furthermore,based on expert publications and website recommendation,our style corpus include 2509 Haofang(bold and unconstrained poems)and 3091 Wanyue(graceful and restrained poems),making totally 5600 pieces of poems for classification feature learning and style recognition task.This research proposes three feature datasets: Haofang Wanyue Dataset,Keyword Expansion Relational Dataset and Chinese phonetic Alphabet Tone and Rhyme Dataset.The three feature datasets,created in this paper,depict Classical poems in the perspective of triplets and relational knowledge structures.The entire Classical Chinese Poetry Corpus collected in this research,featuring in the entire and whole collection of poems,ranging in all ages and among vast amount of tracable poets,in either antient books,existing database,website listings,or records in all accesible publications,is to contribute as a big data for poetic literature studies as well.(2)A Novel CNN Classification Mehod for Poetry Style RecognitionCNN is a classical network,which is highly efficient in image processing and classifying tasks.This research breaks the long lasting rountione that CNN might be incapable to process natural languages.We propose a method to transfer poetry texts into image pixels,so that our CNN network architecture can handle poem input.This Text to Image(T2I)approach,proposed in this research,can achieve a better classification accuracy in Haofang and Wanyue poetry style recognition tests,meanwhile retaining its low time cost in computation complexity.(3)Poetry Generation Model on Self-Attention Mechanism based Reccurent Variational AutoencoderNatural language processing usually applies Seq2 Seq structure in Recurrent Neural Networks,and attention mechanism greatly improves generation fluency,yet bearing the shortcoming of incoherence.This paper extracts poem features,employing a self-attention based mechanism.Variational Autoencoder can be applied to generate conditional images,and Gumbel-Max sampling provides Discret Data generation possibilities.Considering poetic texts’ multiplex and complex,this paper constructs a Recurrent Variational Autoencoder based on self-attention mechanism,and further proposes a unique level and oblique tone matching practice,along with its rhyming techniques,from the perspective of Chinese Alphabetic Representations.This research trains word embedding through Word2 vec,ELMo and BERT,and compares poetry generating results with LSTM,GPT,Tsinghua JIUGE and Generative Adversal Network.Under carefully designed human assessment and machine assessment,DW Poem Maker achieves a better result.The model proposed in this paper gains its effect in poem generation.(4)Poetry Machine Assessment from discriminator in Conditional Generative Adversarial NetworkThe readability of machine poems can get validation from assessments.Experts and mass public read poetry outputs and determine the author as machine or human poet.Once a poem maker’s output pieces confuse human judgement,with accuracy rate around 50% authorship,its machine poems are accepted.As for the quality of machine poem,this paper invited literature professors for a thorough exam by scoring machine poems in different control groups.In addition to human assessment,a novel GAN(Generative Adversarial Network)machine assessment approach is proposed in our research,to compare poem outputs generated by different poem makers.GAN includes generator and discriminator.We improve network discriminator for poem quality scoring,and further add a poetry style(Haofang or Wanyue)scoring index.Experiment results show our DW model is better in writing fluency and emotion expression.The discriminator machine assessment method can improve DW machine output,for better readability and aesthetic value.Most importantly,it is a beneficial complement for human assessment.In conclusion,the major contribution and achievements are:(1)a corpus construction on the utmost whole collection of Classical Chinese poems;(2)a qualitative and quantative poetry style classification with T2 I artificial neural network model;(3)a language enginnering practice to generate machine poems with improved attention mechanism based neural network VAE and most importantly(4)a breakthrough in machine poem assessment by GAN discriminator.The DW poem maker research is a cross-discipline project,which explains in evidence the significance,dedication and outcome of the author’s fulfillment as a linguistics major in cross-disiplinary pursuit over the past and coming years.
Keywords/Search Tags:Classical Chinese Poetry Generation, Neural Networks, Self-Attention Mechanism, Generative Adversarial Network, Word Embedding
PDF Full Text Request
Related items