Font Size: a A A

Research On Distributed Data Mining Platform Based On RMI

Posted on:2008-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Q GuFull Text:PDF
GTID:2178360215491309Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data mining is also called the knowledge discovery from database, it is a procedure which picks up the implicate information and knowledge from lots of data. Traditional data mining algorithm deals with the data which is concentrated in a single machine. Also, the whole computing tasks are run in a single machine. With the development of the network, especially the Internet, the data which we can get is increasing steadily and is distributed in nodes of the network. On the other hand, as the data mining algorithm's target data is larger, it needs more computing units to fulfill the data mining procedure. It is impossible to run the data mining algorithm in a single machine. For these two problems can't be solved by traditional data mining, distributed data mining techniques come into being.Distributed data mining (DDM) is the knowledge discovery from distributed data sources using distributed computing units. It includes two aspects: the first is distributed data sources data mining across nodes of the network. The second is data mining using distributed computing units. Through comparing Agent, Grid Computing, RMI, CORBA etc. distributed technologies, we find that Agent and Grid Computing are immature technologies. Although they do have very good prospect, it is to too difficult to use them. RMI is a very mature technology and is easy to use it, and it can run across platform. So we choose RMI to realize distributed data mining algorithm.Based on realizing FP-Tree algorithm and ID3 algorithm as distributed data mining algorithm, this paper probed into the distributed data mining realization by RMI. It researched into the following contents:First, it researched the RMI technology and put the data processing and task running in traditional data mining algorithm as RMI services. Second, it researched traditional data mining algorithm, including Apriori, FP-Tree, ID3, C4.5, etc. and realized FP-Tree and ID3 in java.Third, put out DFP algorithm and DID3 algorithm which both realized data distributed and computing distributed.Final, realized distributed data mining platform prototype.DFP algorithm and DID3 algorithm given in this paper both realized data distributed and computing distributed. Experiments proved that DFP algorithm and DID3 algorithm have better performance than traditional counterpart algorithm. Also it puts out the common thought during designing distributed data mining algorithm.
Keywords/Search Tags:Distributed Data Mining, Data Mining Platform, RMI Technology, DFP, DID3
PDF Full Text Request
Related items