Research On Mass Data Processing And Data Mining Key Technologies

Posted on:2016-05-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Liu

Full Text:PDF

GTID:1318330542474116

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data mining refers to the process of using algorithm to search hidden information from a large number of data by algorithm.The rapid development of information technology and the Internet has produced a large number of data.How to effectively store,process these data,and mining the hidden knowledge from these data is an important work.This thesis studies the large-scale mass data preprocessing methods,storage method,data deduplication method,and on this basis,through the knowledge of data mining technology on the implied in the mass data of mining research,specifically including the following aspects of work:1.In the face of the problems existing in the massive data,based on the technology of data preprocessing,compared the data compression,the incremental backup and data deduplication technology,mainly studied the technology of the data deduplication,proposed a delete duplicate data and adaptive optimization method based on K-Means.Firstly,using consistent hash algorithm in a distributed storage system,combining Bloom Filter structure algorithm used in a single query system,improving the efficiency of distributed data search index;at the same time,by improving the partitioning algorithm based on Rabin fingerprint,as well as the use of suffix adaptive data block optimization method,the data selection block method has better adaptability and data transmission effect;in addition,proposed a delete duplicate data method based on K-Means,accurately identify duplicate data,improve the efficiency of duplicate data detection and deletion.2.Applying clustering algorithm in data mining to the clustering research of after preprocessing and eliminating redundant data,we put forward Feature weighting and non-negative matrix factorization-Multi view Clustering(FWNMF-MC)algorithm.FWNMF-MC algorithm considering the characteristics of weight and high dimensional data in the multi view clustering process,according to the different characteristics of each feature and the importance of each perspective in the process of clustering,automatically endowed different weights.Dividing the feature matrix into basis matrix and coefficient matrix,then their multiplication can help map the high-dimensional space to the low-dimensional space.At the same time,maximize the consistency of each perspective in the low-dimensional space,in order to efficiently utilize the clustering structure of mining data from every perspective.Finally,the experiment shows that compared with the current algorithm,FWNMF-MC has better clustering effect and is suitable for handling mass data.3.Applying association rules in data mining to the research of after preprocessing and eliminating redundant data,Association Rules Mining based on Particle Swarm Optimization(ARM-PSO)is put forward.ARM-PSO is based on Particle swarm optimization strategy,firstly,the optimal threshold of each particle need to be found through ARM-PSO,and then these data will be conveyed to binary value to find the threshold with the minimal suitable and support and credibility.The experimental results show that the ARM-PSO algorithm can quickly and objectively give appropriate minimum support degree and confidence degree,while guarantee the mining efficiency,can obtain high quality association rules,suitable for dealing with massive data set of association rules mining.

Keywords/Search Tags:

data mining, data deduplication, multi-view clustering, particle swarm optimization, association rule

PDF Full Text Request

Related items

1	Research On Association Rules Mining Algorithm Based On Particle Swarm Optimization
2	The Particle Swarm Optimization And Its Application
3	Research On Mining Algorithm Of Association Rule And Its Application For Biological Data
4	Research Of Association Rules Mining And Clustering Analysis On Energy Consumption Monitoring Data
5	The Research Of Web Data Mining Based On Particle Swarm Optimization (PSO)
6	Research On Spark-based Association Rule Mining Algorithms
7	Research Of Fuzzy Clustering Based On Particle Swarm Optimization Algorithm
8	Research Of Association Rules Data Mining Based On Improved Particle Swarm Optimization Algorithm
9	Fuzzy Association Rules Extraction Based On Particle Swarm Optimization And Its Implementation In Parallelization
10	Research And Application Of Data Mining Technology In Tax Administration