Font Size: a A A

The Application Of Data Mining For Predicting Listed Companies'Prospect

Posted on:2006-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:W B YuFull Text:PDF
GTID:2189360185454953Subject:Statistics
Abstract/Summary:PDF Full Text Request
Data mining is the exploration and analysis of large quantitiesof data in order to discover meaningful patterns and rules, whichhas been successfully used in many areas, such as: databasemarketing, market segmentation, risk analysis, fraud detection, andcustomer relationship management. This dissertation applies datamining for predicting listed companies' prospects, especially theprospects of profits' growth.To begin with the dissertation, I will explain the main conceptsand technologies of data mining, including: data miningmethodologies, data mining models, and the validity of models. Wemay look at data mining as a technical process, which is calledmethodology. The most important step of the process is modeling,and at that step many models, such as: logistic regression, decisiontrees, artificial neural network, and etc., can be used. Each modelhas its own characteristics and algorithms. After modeling, we mustassess the validity of the models, which means overfitting has to beavoided.After that, we will move to the emphasis of this article:predicting the listed companies' prospect. As we all know, theprofits' growth is the key problem which investors, equityinvestment funds, and security companies are concerned with.Only the companies that can obtain profits' growth in the future arevaluable to be invested on. For investors, equity investment funds,and security companies, the financial reports are the major sourceto acquire information. Hence, there are three questions thatdeserve answering. Number One, is there any information which isconcerning the companies' profits growth in the financialstatements? Number two, can we find a way to make good use ofthis information in order to set up models which can predict profitsgrowth well? Last but not least, are the models valid? Thisdissertation is going to ask these three questions.Nowadays, we may see quite a lot of research on the prospectof listed companies, but most of them have detects as listed below.1. Some researches used paired samples, which overstates theaccuracy of the models. 2. Part of the researches use the samedata set to both build and validate the models, which will causeoverfitting. 3. Regression models are the most common ones thatare used, and few articles apply modern data mining models, likedecision trees and neural networks.In this dissertation, I used population data: three years'financial reports data of all the Chinese listed A-stock companies,instead of paired samples. Furthermore, data partition is used toavoid overfitting. In addition, SEMMA is the methodology of mydata mining process, and not only tradition statistical models, likeregression, but also modern data mining models, like decision treesand neural networks are applied. All these methods make the listedcompanies' prospect profits growth valid and accurate.
Keywords/Search Tags:data mining, population analysis, overfitting, data partition, profits' growth
PDF Full Text Request
Related items