Font Size: a A A

Data Modeling Analysis Based On Decision Tree Induction And Its Applications In Railway Transportation

Posted on:2008-07-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y LvFull Text:PDF
GTID:1228330467967491Subject:Transportation planning and management
Abstract/Summary:PDF Full Text Request
With the reform and development of China railway information system, railway department has succeeded in building a series of information systems, such as Ticketing and Reservation System (TRS), Transport Management Information System (TMIS), Dispatching Management Information System (DMIS), Automatic Train Identification System (ATIS), Train Operation Safety Safeguard (TOSS), Finance and Accounting Management System (FAMES) and so on. Huge and rich data have been collected in these successfully implemented systems. So it is becoming the urgent demand for the railway decision department to extract the valuable decision information from the huge collected data. This has been the emphasis for railway information reform. With the rapid development of data mining technologies, they provide the valid tools to analyze data. Regarding the demands of railway passenger transport decision support as our study background, analyzing about the train ticket characteristics, taking the decision tree induction and time-series analysis as the basic analysis techniques, and combining with the limitations of current data mining techniques on analyzing the imbalance datasets, we do deeply research and make lots of application experiments on how to build the efficient data analysis model on ticket dataset in TRS.Firstly, according to the collected process of ticket data which produced by the real bargaining systems, this paper takes the emphasis of data preprocessing on concept hierarchy, data reduction, data standardization, data discretion, attribute construction and dimension reduction to make further application research for cutting down the disturbance in data analysis by the ticket data.Based on the study on decision tree induction in detailed, this paper mainly studies on the decision tree algorithms like ID3, SPRINT, SLIQ and so on. According to the application of decision tree induction in ticket data analyses, it is found that there is the limitation in analyzing the imbalance main class. This paper presents the measure of Key Degree (KD) to improve the leaf node label measure of the decision tree, which aims at solving the unfair competition among the main class between the majorities and the minorities for labeling the leaf node. This algorithm adapts to proceed the huge imbalance ticket datasets and extracts a kind of instructive rules that collect the advantages both prediction and statistic, therefore it is suitable for supporting multi-level requirements of the decision-makers for predictive analysis.When the method of time-series analysis is applied to analyze ticket data, it is always been got the function expressions as the data analysis models. This kind of models may always be lack of explain-ability and be poor at expressing relative factors clearly, so this paper presented a new method which applied the improved decision tree algorithm to the constructed datasets to solve this problem. The constructed datasets’attributes are extracted from the data sequences by using the classical time-series analysis method to get the evolutive characteristics. This method is good at distinguishing the key factors influencing the time-series, and the data analysis model which has been obtained has good usability. The validity and reasonability of this method have been proved by the application analysis about railway passenger transport.It is the useful way for ticket data analyses to realize the static and evolutive data characteristics extracted by using decision tree and time-series analysis, but if the intellectualized problem of railway passenger transport want to be solved completely, it is necessary to build the data modeling and analyzing system. This paper studies the frame work about the data modeling and analyzing system initially, presents the service frame work, logic frame work and physical frame work about the system, and pointed the key to solve the current intellectualized of China railway information system is to build the useful data modeling and analyzing system. It provides the guidelines and services for ticket data analyses.The main contents of this paper are as follows:1. This paper presents the measure of key degree to improve the decision tree leaf node label measure, which makes each of main classes has the equal rights to compete the leaf node label and solves the problem about analyzing of imbalance datasets in the practical applications.2. Taking the advantage of the time-series analysis method in extracting the evolutive features of data sequences, this paper presents the method that take time-series analysis as the method of attributes construction to solve the poor explain-ability and relative-factor analysis of the current time-series analysis method.3. This study makes an efficient exploration in the application fields of the techniques both on decision tree and time-series analysis, and provides a favorable groundwork to make further researches on data analysis in TRS. And the improved methods have the ability to build the efficient data analysis models to help decision maker to know the railway transportation situations well, get the multi-aspect, multi-level analyses for train ticket data.4. This paper presents the system frame work about railway data modeling and analyzing system, which is aiming at providing the open and valid path to solve the intellectualized problem about the current railway passenger transport information system.
Keywords/Search Tags:Data Mining, Classification, Decision Tree, Time-series, Railway PassengerTransport, China Ticketing and Reservation System (TRS), Decision Support
PDF Full Text Request
Related items