Font Size: a A A

A Distributed Storage And Computing Platform Based On Bayesian Algorithm

Posted on:2019-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:H L XieFull Text:PDF
GTID:2428330593451096Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining has become an indispensable standard tool for data processing,text categorization and data research in the era of big data.Usually,in order to achieve the goal of data mining,we must cooperate with data crawling,data storage,data analysis and other means.However,there are some difficulties or pain points from all aspects in the practical work of all stages of data mining,whether it is data crawling,data storage or data analysis.Data Crawling: There is no uniform source of data in various systems of the current Internet,and the data sets are not decentralized and merged well.Data storage: a single form and a single machine have the risk of data loss when storing large amounts of data.Data Analysis: Climbing Dirty data is commonly found in data.The purpose of this project is to propose a common solution and method for the traditional data mining process and to provide a platform or framework for data crawling,data storage and data classification integration so as to avoid data crawling,data storage and data classification as much as possible The three links of the risk: In the process of data crawling,there is no uniform source of information to solve the problem,the data set scattered decentralized pain points.This topic selects webcollector crawler framework in the data crawling process to ensure the real-time performance of crawling data to the maximum extent.In the data storage process,we use the distributed network storage system of codis + redis cluster to store data in real time,The process has full security and reliability,while improving the efficiency of data access;data analysis,the data through the Bayesian classifier,you can solve the problem of excessive dirty data,to maximize the purity of the data Sex and usability.The evaluation of the experimental prototype shows that the framework proposed in this subject can functionally fulfill the requirements of data mining.Compared with the traditional data mining model,the framework achieves low performance and low consumption,reaching the expectation.
Keywords/Search Tags:Data mining, Reptiles, Distributed Storage, Bayesian Theory, Data Classification
PDF Full Text Request
Related items