Font Size: a A A

Research On Data Mining Methods Based On Rough Set And Swarm Intelligence

Posted on:2008-05-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y PanFull Text:PDF
GTID:1118360242460147Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Over the last two decades, remarkable science and technology advancements have changed the way we view the word and do business. Our economics and society has been benefiting greatly from science and technology breakthroughs. We have been able to capture and analyze enormous data because of technology improvements. Researchers keep exploring effective ways to extract valuable information from these massive and complex data sets. Data mining is an analytical process developed to explore large amount data by combining knowledge from statistics, database, machine learning, etc. and summarize it into useful information.Among various data mining methods, the theory of rough set is a relatively new but powerful intelligent technique used in the analysis incomplete data set from large databases. It has extensive application in various areas including science, business and marketing. One of the primary methodologies of rough set theory is attribute reduction, which refers to the process of discovering reduced sets of attributes. Rough set attribute reduction and classification have been researched in a variety of data mining applications over the last several years. Reducing redundant attributes will greatly improve the clarity of the potential knowledge in the system, reduce the time complexity of discovery rules, and improve discovery efficiency.Swarm intelligence is a new field that attracted the attention of researchers from various fields. It has become a frontier in artificial intelligence and interdisciplinary subjects such as economics, sociology, biology, etc. Particle Swarm Optimization is a population based stochastic based optimization developed recently and has been successfully applied in many research and application areas. It attracts many researchers all over the world and becomes a hot research topic in evolutionary computation. Time series data analysis has been widely used in various industries. In the last decade there has been an explosion of interest in mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. Analyzing time series to discover data patterns and forecast futures is important to industry and business.To address the above issues, Supported by National Natural Science Foundation projects: " Statistical Relations Study Research " (60573073) and " The Non-standard Knowledge Processing Elementary Theory and the Core Engineering Research——Non-standard Knowledge Mathematics Basic " (6049632) , this paper investigated various data mining methods based on rough set and swarm intelligence. Major contribution includes:1. Overview of data mining and rough set theory: data mining origin, method, and steps; rough set theory origin, advantage, and current research progress.2. We provided a detailed description of attribute reduction based on rough set and compared and analyzed several algorithms used in rough set and attribute reduction. We proposed an improved method based on axiomatic evolution, which improves existing attribute reduction based on genetic algorithm. Because characteristic length is not decided, we used a different approach other than crossover and mutation in genetic algorithm. This method can effectively speed up the converge rate. Our experiments show that this algorithm works well in most testing data sets.3. Introduced several swarm intelligence methods, including ant colony optimization, Ants-clustering, particle swarm optimization. Focused research on particle swarm optimization and proposed two improvement strategies. We showed the adaptability of the traditional particle swam optimization as well as the new two proposed methods through experiments. We pointed out the importance of parameters control in particle swam optimization and provided methods of selecting appropriate parameters based on differential evolution and got satisfactory results which is validated through experiments.4. Analyzed binary PSO algorithm attribute and raised the problems when applying binary PSO algorithm in attribute reduction. To some extent the problem is caused by the algorithm itself - it uses speed to control the probability of particle mutation, while the probability and the initial speed is somewhat correlated. Also there is no selection control in the method - the updated particle will replace the current particle no matter the evaluation function is good or bad. To fix this problem, this paper proposed the binary particle swam optimization method based on simulated annealing and weak population mutation. We applied this method to casing damage forecast of oil field. There are 62 related attributes in the database and it is impractical to use all attributes to forecast in terms of computation time and prevention process after forecasting. Using our new algorithm, attributes reduce to 12 from original 62. It becomes possible to forecast using these 12 attributes as the inputs.5. Discussed time series data mining and major methods in time series forecasting. Among all time series methods, we focused on artificial neural network model especially BP neural network model. This paper proposed a neural particle swarm algorithm which treats neural network as one of the particles in the swarm. The resulting optimized particle from the algorithm was used as the forecasting model of the casing damage and back-tested the trend in 80s and 90s. The accuracy rate is 84.7%, significantly better than 25% rate using other methods. This can save hundreds thousands dollars if we can save one oil well. It has great business potentials.Overall, this paper studied data mining methods based on rough set theory and particle swarm optimization and proposed an attribute reduction algorithm based on power set evolution, a binary particle swarm optimization algorithm based on annealing choice and weak population mutation, a parameter control strategy based on differential evolutionary algorithm, and a time series forecasting method based on neural particle swarm. We also showed their potential application through experiments. In the future, we plan to further improve these methods and design a complete and highly efficient time series data mining system and apply it to real applications to show its business values.
Keywords/Search Tags:data mining, rough set, swarm intelligence, particle swam optimization, time series
PDF Full Text Request
Related items