Font Size: a A A

A Fingerprint Engine For Author Profiling

Posted on:2010-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:N P DongFull Text:PDF
GTID:2178360278473991Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of internet, digital texts are proliferating. Protection of copyright has become increasingly important in recent years. To solve the copyright problem, one way is to profile an author's writing style. By comparing writing styles, we could tell whether a text has been written by a certain author. Most of the current research in author profiling focuses on examining linguistic attributes or finding new attributes. However, appropriate profiling of an author is still a challenging task.This paper aims to build a model to fingerprint an author, which takes texts of an author of a certain domain as input and produces a profile of the author as output. Using this fingerprint engine we can tell with a certain probability whether an input text has been written by an author among a list of possible authors.The fingerprint engine consists of two sub engines, the training engine and the fingerprinting engine. The training engine takes training data from an author of a certain domain as input and produces the fingerprint of the author. The fingerprinting engine takes any text as input, and computes the probability that a text belongs to an author.A fingerprint of an author consists of three parts: the first part consists of the strong measurements; the second part consists of less strong measurements. The measurements in the two parts can be easily represented with a number or a vector. The rest of the measurements, which are too complicated to be represented with number or vector, form the third part of the fingerprint.The procedure of fingerprint engine is as follows: first extracting the measurements, then, comparing the extracting result with three parts of an author's fingerprint respectively and get the probability, at last, giving each probability a weight, and calculating the final probability as the result.The fingerprint engine is implemented in two environments - VC++, and MATLAB -with 205 selected textual measurements. The extracting measurements algorithms are implemented in VC++, and the data analysis is implemented using MATLAB. A simple test was done using the selected measurements to show how the fingerprint engine works. The results show that the fingerprint engine can indicate the right author.This paper focuses on author profiling of English texts. Writing styles are measured using linguistic attributes and linguistic measurements. Statistical methods, such as standard deviation analysis and principal components analysis, are used to evaluate the linguistic measurements' efficiency.
Keywords/Search Tags:Author Profiling, Text Analysis, Natural Language Processing, Data Mining, Artificial Intelligence
PDF Full Text Request
Related items