A Neural Network Based Fictional Character Representation And Its Applications

Posted on:2020-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2405330575953100

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the great development of ideology and culture,people create a large number of literatures.Therefore,it puts forward new challenges to the application of natural language processing technology.We collect a large number of English novels and construct a corpus of literary text.Neural network is used to represent characters,and the learned character embedding is used for characters clustering and classification.This thesis applies natural language processing technology to literary field,which will promote the research of both natural language processing and literature.The specific research contents are as follows:(1)The construction of the literary corpus.More than 20,000 English novels are collected from Project Gutenberg,and preprocessing steps include tokenization,part-of-speech tagging,named entity recognition,dependency parsing,name clustering and coreference resolution.In total,we extract over 400,000 characters and all feature words with certain dependency relationships with those characters,and carry out statistical analysis of the those feature words.(2)Dependency feature based distributed character representation and its analysis.Like the training of word embedding,we utilize skip-gram model to train both character embedding and dependency word embedding simultaneously.Based on the learned character embedding,character similarity is computed.Experimental results show that our method outperforms topic models in three out of four hypothesis classes in the test set.In addition,by analyzing nearest neighbors of certain characters,characters from different novels of the same author tend to be similar to each other.(3)The applications of the character embeddings.The learned character embedding is applied to character clustering and classification.Assuming characters in a cluster belong to the same author,the k-means based clustering model achieves a purity of 0.724.We build a dataset for personality and gender classification,and propose a multilayer perceptron model based on the learned character embedding,and achieve a better performance on both tasks than a model based on the average pooling of word embedding.

Keywords/Search Tags:

fictional character, character embedding, neural networks, character similarity computation, character profiling

PDF Full Text Request

Related items

1	The Study On The Development History Of Vietnamese Characters
2	A Study Of The Character Of Contemporary American Mini Drama
3	The Study Of The Chinese Character's Family
4	The Research About Character Component Of <Pre-Qin Ancient Seal’s Character Compilation>
5	A Study On The Phenomenon Of "VO-style Clutch Two-character Group" In Modern Chinese
6	Research On The Tibin Phenomenon In Modern Chinese "VO-style Lihe Two-character Group"
7	Character And Narrative
8	Distorted Character Formed In Industrialized Society
9	Chinese Character Textbook Design For Beginners From Non-Chinese Character Sphere Based On The Research Of Their Chinese Character Errors And Cognitive Character
10	The Cognitions On Chinese Character Of The High School Students In Thailand And The Effects On Writing Chinese Character