Font Size: a A A

Research And Implementation Of Data Mining Algorithms Based On SSAS

Posted on:2009-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:X GuoFull Text:PDF
GTID:2178360242480335Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of database technology and the extensive database management system applications, database storage of data increases rapidly. Much important information exists in a large amount of data, and these would be important information to support the people's good decision-making. At present database system can be accomplished only in the database to access to the data, and what people get from these data is only a part of the data and the more important information is the characteristics of the data and the description of its development trend forecasts. The information generated in the decision-making process has very important reference value. So the requirements of data-processing technology is also rising, that is needed to be able to conduct a deeper level of data processing, in order to obtain the overall features of the development trends and forecasts of the data.Data mining is to discovery interested knowledge from large data sets (which may be incomplete, noise, the uncertainty, various forms of storage), which is implicit, previously unknown, and in the decision-making have potential value. The extracted knowledge can be described as for the concept, rules, laws and forms mode. Therefore, data mining as a new field of study, involving such as machine learning, pattern recognition, statistics, databases, artificial intelligence, mathematics and visualization technology, and other areas of learning, is an emerging research fields with broad application e.In this paper, the basic process of data mining and the main tasks of data mining were discussed. The paper also has a study on the entire data mining process: data integration, data cleansing, data selection, data transformation, data mining, pattern assessment, a test that knowledge and practice. We have a deep research on association rule mining and decision tree building.Then in the second chapter, this paper studies the algorithm of discovering the maximum frequent item set. In this paper, with Microsoft SSAS environment, we improve and implement the algorithm FMCUBE with a set-enumeration tree used to describe the item set, based on data cube. The FM_CUBE algorithm significantly improves the efficiency of discovery. Identifying the frequent subsets is the key technique and the computationally intensive step in association mining task. In fact, any frequent subset is a subset of a maximal frequent item set. FMCUBE which finds the most frequent item sets of data mining application provides an effective and quick method. In Chapter II, association rules were first introduced, and classical algorithm Apriori is explained. Then proposed the largest frequent item sets FMCUBE algorithm. Unlike relational database entities - relational model, in data warehouse data model is multi-dimensional data model, it will form data as data cube. Multi-dimensional data cube is the statistical entities. Based on the data cube an subset is a combination of different members of data cube, and the support of the subset is the measure value. Generally algorithms discovering frequent item sets based on the data cube calculate the support with using data cube. Some scholars have given the algorithm based on the data cube and the frequent Apriori Algorithm.The authors use C# to program the Max-Miner algorithms and FMCUBE algorithm, and use SQL Server 2005 Analysis Services to generate the data cube and access the data cube through ADOMD.net and MDX.In the third chapter, based on the research of the smallest error pruning Decision Tree Algorithm is designed. The experimental results show that the proposed algorithm has better performance than Microsoft algorithm in terms of timing. In Chapter III, the classic first Decision Tree Algorithm ID3, C4.5 were analyzed and studied. To set a record for each record has the same structure, and each structure by the number of pairs of attribute values constituted. Those properties are on behalf of their respective categories. To solve the problem is to construct a decision tree, and thus gained by the non-category of attribute values correctly predict the answers attribute value category. Then two kinds of algorithms of the main advantages and disadvantages are analyzed. Then the whole generation of decision tree process has been more detailed Description: Decision Tree Construction mainly divided into two parts, the first generation trees, at the beginning, all the data in the root node, and then recursive data points tablets; Second, Tree pruning is likely to remove some of the noise or abnormal data. Decision Tree stop division of conditions: a node, the data belong to the same category; not attribute data can be used for segmentation.Then, the main pruning methods for detailed is studied and discussed. Pessimistic on the wrong pruning PEP, the smallest error pruning MEP, the cost of a complex pruning CCP, based on an incorrect pruning EBP, such four major pruning algorithm are studied, and so do their pros and cons.Last, Microsoft decision tree algorithm is described and examples in the database through the SSAS CollegePlans, MovieClick a decision tree to achieve data mining.In the fourth part, on the basis of the pruning algorithm study, in accordance with the principle of minimum pruning mistakes, the ID3 optimization algorithm is proposed. This algorithm greatest advantage is that it can be in accordance with the characteristics and attributes of data from the optimal choice of nodes generating program and the lowest error rate, so that the system will not only improve the efficiency of operation, and can reduce the occurrence of the error rate. Then, based on the above pruning methods, we use C # achieving a decision tree optimization algorithm. Here, the tree controls (TreeView) of Microsoft Visual Studio 2005 is used to achieve the addition of trees. Finally, the Decision Tree algorithm is better efficiency than Microsoft Decision Tree.Finally, a summary of this paper is given, and data mining future is discussed.The study results of the thesis, especially of maximal frequent item sets and decision tree, are of both theoretical and practical benefit to further researches.
Keywords/Search Tags:Data Mining, Association Rule Mining, Data Cube, Decision Tree
PDF Full Text Request
Related items