Font Size: a A A

Support Vector Data Description And Its Application Research In Detection Of Fraudulent Financial Statement

Posted on:2011-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2178360302993711Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Support Vector Data Description is a one-class classification method based on statistic learning theory. It has many special advantages in solving pattern recognition problems such as limited sample, nonlinear and high-dimensional data and has become another focus in the field of Machine Learning. Support Vector Data Description only needs one-class sample information for modeling, so it can well solve the problem that fraud data are not readily available. It plays an important role in reducing investment risk, enhancing the transparency of accounting information and promoting the healthy development of market when use it in the research of the identification of the financial statement. Therefore, further research of Support Vector Data Description will have a high academic value and great practical significance.This paper summarized the present research status at home and abroad about Support Vector Data Description and analyzed the advantages and disadvantages of these methods. In order to overcome the problem of membership calculation, a kind of calculation sample membership in kernel space was proposed, and then a hierarchical fuzzy Support Vector Data Description algorithm was implemented. Since the discrimination policy for overlapping regions used by multi-class classification algorithm based on Support Vector Data Description is shortage, an improved algorithm for Support Vector Data Description multi-class classification based on relative density in kernel space was proposed. In view of the existing boundary optimization algorithm fails to take full advantage of the distribution information of sample in kernel space, a new boundary optimization algorithm was proposed. In order to solve the problems of existing incremental Support Vector Data Description, an improved algorithm was proposed. In this paper, a model of financial statement fraud recognition was designed and implemented based on the research of Support Vector Data Description.The main achievement in our work is listed here: 1. This paper summarized the research situation of Support Vector Data Description, introduced the basic problems of Machine Learning and statistical learning theory, and made a detail discussion for Support Vector Data Description.2. A hierarchical fuzzy Support Vector Data Description algorithm, KHFSVDD, was proposed. Firstly, the original problem was divided into K sub-problems by kernel K-Means. Then, the local description for each sub-problem was generated by fuzzy Support Vector Data Description. Finally, the global description of the original problem was build by combined the solutions of sub-problems.3. An idea of relative density in kernel space was raised. The relative density in kernel space was served as the foundation of decision-making for those samples which lay in the hyper-sphere overlapping area in multi-class classification algorithm based on Support Vector Data Description.4. A boundary optimization algorithm was proposed. The algorithm determines the type of samples which are near by hyper-sphere boundary according to the samples mean density information around hyper-sphere boundary and the distance from test sample to hyper-sphere center.5. An improved incremental Support Vector Data Description algorithm was proposed. According to the analysis of composition of support vector set, the algorithm dynamically selects the data from sample set to train, so it can not only reduce training samples but also retain more information about data distribution6. This paper designed an identification model of fraudulent financial statement based on Support Vector Data Description. The model is composed of initial description, incremental description and statements detection modules.
Keywords/Search Tags:support vector data description, multi-class classification, boundary optimization, incremental learning, identification of fraudulent financial statement
PDF Full Text Request
Related items