Font Size: a A A

The Research On Authorship Identification Technologies Based On Writing Stylistics

Posted on:2014-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiuFull Text:PDF
GTID:2268330395989042Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The research on authorship identification by analyzing the writing stylistics of the works has began for a long time in English, which has many applications, such as identifying some controversial literary works, determing the plagiarized articles, and so on. But less research has been done on Chinese, and the most well known research is the disputed authorship of Dream of The Red Chamber.Authorship identification is actually a kind of text classification, which is to assign the class to a document according to its content from the given classes automatically. After several year’s researches, many text classification algorithms based on statistical and machine learning has been proposed, such as KNN, Naive Bayes, Support Vector Machine (SVM), and so on.Writing stylistics is to analyze the writing style of an author using some statistical methods, and the writing style of an author is the personal characteristics expressed in language activities, and the reflection of personality. The writing style can be measured by some quantitative features, for example, word length and sentence length can reflect the style of making sentences, the frequency of word and character can also reflect the personal style of an author, besides we can also use rhetoric and syntactical features.Hidden Markov Model is a sequential analyzing and learning model based on statistical, and has also been used in text classification. The writing style features in the document is also in a sequence, meanwhile SVM algorithm has been proved to behave very well in classification. Therefore, we propose two algorithms combining writing stylistics and text classification to conduct authorship identification:the algorithm combining HMM and writing stylistics, the algorithm combining SVM and writing stylistics. From the experiments afterwards, we can see the result of these two algorithms is good. At last we introduce a model about the robustness of writing stylistics.
Keywords/Search Tags:authorship identification, HMM, SVM, writing stylistics
PDF Full Text Request
Related items