Font Size: a A A

Design And Realization Of A Online Data Mining System Based On Hadoop

Posted on:2017-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuanFull Text:PDF
GTID:2308330485984522Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The constantly development of distributed data storage and processing technique enables us to explore potential value from massive data at low cost. Today data mining technique has progressed to the point that it meets the needs of various kinds of application scenarios. However the low integration of systems and high threshold for users make it difficult to use. In the thesis, we designed an online data mining system, which implements a novel method for construction of process and validation of data mining task by dragging visual components, to lower the use threshold of data mining technique and improve effectiveness in setting up of data mining task.In the thesis, we studied the general process of data mining and its technology stacks under the big data environment. On the basis of Hadoop and its relevant service components, logic modules of data mining process, including data transformation, data modeling, data evaluation, etc., are encapsulated as operators to conveniently build and evaluate data mining process by constructing data mining work flow using series connected operators. The main contributions and innovation points of our system are as follows:1) Integration of operatorOperators should be implemented based on a unified abstraction with good scalability and configurability in order to achieve flexible combination of operators and requests of data mining task. We use the Hive table as the data model of operator and implement operators including data input and output, data transfer, data modeling, performance evaluation, etc., as well as provide well designed inheritance hierarchy for further operator extension.2) Implementation of workflowThe workflow is the description of data mining process, which needs to deal with issues of description, execution and control of work flow. We designed a set of workflow services consisting of workflow decomposing service, data management service, operator execution service, making it possible to build up data mining process without any coding.We have completed the test of our system. It shows that the online data mining system can achieve anticipative requirements and reduce the time to build and verify the data mining process.
Keywords/Search Tags:Data Mining, Hadoop, Machine Learning, Distributed Computing
PDF Full Text Request
Related items