Font Size: a A A

Research And Implementation Of Big Data Mining Service Process Engine Based On JBPM

Posted on:2018-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:P GaoFull Text:PDF
GTID:2428330542490117Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the new century,with the rapid development of Internet and information technology as well as high performance computing,operating systems' computing performance in various industries is becoming stronger as their scale grows much bigger,thus generating explosive and exponential data increase.However,along with the development of technology come many problems.With more and more data source and large amount of data count by the PBs,it is becoming increasingly hard for people to get valuable information and knowledge from them,and therefore people may find it more difficult to improve work efficiency,make business decisions and carry out scientific analysis.Traditional applications and information systems show obvious deficiency in dealing with today's mass data and in digging useful information.On the one hand,traditional software systems don't take into consideration the changeable and flowing demand of data mining in different function blocks and at different work phases,and thus largely increase the cost of research and development as well as debugging.On the other hand,traditional single-machine computing system cannot cope with such large amount of data anymore.Therefore,a distributive data mining engine system that is easy to allocate and reuse is becoming an urgent demand in the realistic world.But the appearance of Cloud Computing provides a solution for problems including mass storage and efficient data mining algorithms involved in the data mining process.Hadoop is a open source cloud computing platform by Appache,featuring efficient and reliable storing and computing capability,which is applicable to dealing with mass data.And JBPM is a light workflow management system based on J2EE.In summary,based on JBPM and Hadoop,this thesis researches and aim to realize a distributive data-mining process engine for handling mass data.By providing a visualized process design interface,this engine can efficiently design the process of data analysis and data mining and input it into JBPM.Combining this with the algorithm libraries of Mahout in Hadoop,this engine can easily and fast realize distributive data mining.The whole process boosts such features as being abstract and visual as well as automated execution,and thus provides a high-performance and cost-effective data mining service,ensuing users not familiar with the principle of distributive system and data mining,are able to obtain knowledge and value from the mass data available by this process engine.
Keywords/Search Tags:data mining, workflow, process engine, JBPM, Hadoop
PDF Full Text Request
Related items