Font Size: a A A

Design And Implementation Of Commercial Data Mining System DataView

Posted on:2020-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:F G HuangFull Text:PDF
GTID:2428330602452532Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays,the field of computer science has been developing rapidly.Not only have cuttingedge model algorithms appear,but the application of related models and algorithms has also been accelerating.Data mining algorithms can be quickly applied to practical problems with the assistance of existing data mining frameworks.However,users can not effectively use these models for data mining in most situations.Baseed on CRISP-DM process theory,one of reasons is the lack of data understanding,ignoring the statistical characteristics of the dataset itself.Although there are a lot of off-the-shelf tools that can provide some data exploration concept tools in data understanding,they cannot be effectively utilized.First,most of these tools implement only data description or data exploration,not integrated them into a series of analysis,causing difficulty of use;Second,it is impossible to intuitively obtain useful information from the complicated analysis results without efficient representation methods,even if the user has sufficient background knowledge to use these tools.The paper aims to build a commercial data mining system Data View,which provides users with a friendly and easy-to-use graphical interface during data understanding phase,which improves the efficiency of users' data mining process.In order to extend the system's data exploration function,work focused on system's data exploration subsystem functional requirements,selecting goodness of fit and test and time series analysis tools.Therefore,the data exploration subsystem integrated distribution model recommendation and the ARIMA model order test function.Subsystem can recommend most suitable distribution model for continuous data and provide time series analysis tools when considering the data may have time property.Paper adopts project combination method,document retrieval method and investigation method to conduct in-depth research on system requirements,investigating functions provided by similar software tools,consulting data mining related literature,analyzing existing theories and models and proposing system design schemes.Problem solved during paper work: design of the Data View system,parameter estimation of distribution models,algorithm optimization of goodness-of-fit test module,and design and implementation of ARIMA model identification algorithm.Proposed solution for scenes that goodness-of-fit test is applicable.System is built using React framework,runs on Node.js platform,uses ECharts as its graphical display plugin,and relies on Spark as its computation background.The extended computation functions are integrated in the system frontend.For testing the system functionality and robustness,etc.,the test framework is integrated in the project,and the corresponding test cases are produced in targeted manner.The test results show that Data View system works well and can demand requirements.The design and implementation of Data View system are achieved before this paper,in data exploration subsystem,goodness of fit test and algorithm design of time series model order test are finished in order to extend function of Data View.At the end of this paper,the summarization of work during system implementation and prospect will be given.With friendly user interface and functions provided by subsystems,system provides users with model test before modeling phase,which solves the difficulty of data understanding in the initial stage,improving the ease of using data mining tools and efficiency of modeling.There are still possible improvements in Data View system.Paper will briefly mentions improvement ideas of system functions with done work.
Keywords/Search Tags:Data Mining, DataView System, Data Exploration
PDF Full Text Request
Related items