Font Size: a A A

Research On Deception Detection For Chinese Text

Posted on:2015-12-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:1108330461985142Subject:Computer application technology
Abstract/Summary:Request the full-text of this thesis
With the continuous development of network and computer technology, the new means of communication are constantly being accepted and used by more and more users. From traditional E-mail, IM to the newly popular microblog and wechat, these media provide the convenient ways of information transmission and the public platform of sharing and interactting a lot of information for users. The means of information transmission based on communications network not only chang people’s daily lives, but also bring with it new places and forms for deception.Existing researches show that one third of the interpersonal communication involves the potential deceptions, and there are large amounts of deceptive messages in the more and more web information. If the deception is potentially dangerous to our daily life, organizational process, and even national security, then the negligence of deception may lead to incalculable loss. At present people remain unsuccessful and inefficient in detecting those deceptive messages, it is desirable to propose an automated method which could flag the possible deceptive messages.Deception has been studied widely in many fields of social science, and it is defined as the active transmission of messages and information to create a false conclusion. In recent years, experts in the natural science have used statistics-based, machine learning and natural language processing methods to carry out the study of deception detection. Deception detection has been an important research in the field of information security. At present the researches of deception detection abroad mainly focus on four areas: deception detection theory, deception detection model, deception detection experiment and deception detection data set, and on the whole those are still at the primary stage, but the researches of deception detection of the Chinese texts have just begun at home and abroad.The main goal of this thesis is to explore the deception detection method of Chinese text. In this thesis we focus on introducing the background and status of the deception detection, building deception detection data set, mining the deceptive linguistic clues and text features, and proposing three deception detection models:classification model, multi-granularity cognition model, ensemble learning model, and then conducting experiments for each model separately.The following are the main research contents in this thesis:(1) Construction on Chinese text deception detection corpus.Put forward the construction norms and principles of deception detection corpus, and introduced the source, content, scale and pretreating measures of deception detection corpus for Chinese text. At the same time, based on the construction norms and principles, we constructed the deception detection corpora including 1493 deceptive texts and 10191 non-deceptive texts.(2) Study on feature extraction method of deception.Proposed the extraction method of the linguistic clues based on hypothesis testing, which firstly assumed a set linguistic clues to differentiate deceptive texts and non-deceptive texts, and used statistical experimental data to validate each hypothesis clue, and then produced useful linguistic clues of deception detection for Chinese text; Proposed two word-feature extraction methods:①using mutual information, CHI statistics methods to extract feature words whose feature value is bigger, 〆xtracting the core words of texts based on dependency parsing.(3) Put forward the deception detection model based on classification.In this thesis we transformed the deception detection question into a two-classification question, and proposed the deception detection model based on classification. In this model we respectively used the bayesian classifier, the maximum entropy classifier, the support vector machine classifier to conduct the deception detection.(4) Put forward the deception detection model based on multi-granularity cognition.From the perspective of multi-granularity cognitive we proposed two deception detection models:multi-feature-based deception detection model and multi-level-based deception detection model. Multi-feature-based deception detection model verified the significance of the different features in deception detection; Multi-level-based deception detection presented a hierarchical model of deception detection from the perspective of human cognition, and conducted the relevant theoretical research on the multi-granularity cognition.(5) Put forward the deception detection model based on ensemble learning.Proposed the deception detection model based on ensemble learning that combines the sampel cutting and the integration of each individual classifier. For sampel cutting we proposed a novel bisecting K-means method, and at the same time proposed a novel Min Max modular method to integrate each individual classifier.
Keywords/Search Tags:Deception Detection, Deception Detection Corpora, Deception Linguistic Cues, Deception Features, Decepiton Detection Model, NLP
Request the full-text of this thesis
Related items