Font Size: a A A

Design And Implementation Of Feature Extraction System For Large-Scale Structured Data

Posted on:2020-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:X F LiFull Text:PDF
GTID:2428330575955114Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of cloud computing,big data,Internet of Things,artificial intelligence and informatization of enterprise,the amount of datas and services from individuals,enterprises and governments have been exploded.This also brings more opportunities and challenges.Nowadays,the scale of data is unprece-dentedly huge,also,the difficulty of mining data is rapidly increasing,data dimension grows rapidly,and the types of data are becoming more complicated.In order to effec-tively mine the massive data and provide clean data for search,recommendation and forecasting,firstly,introduce an:feature engineering.For the reason,extracting information from massive raw datasets has became a significant part of data mining and model training.Feature engineering is the process of using domain knowledge of dataset to create features that make machine learning al-gorithms works better.Feature engineering can extract features from high-dimensional data,filter and eliminate redundant features,and also,handle various types and nu-merical transformations on features.These methods can effectively avoid dimensional disasters,speed up progress and reduce the space complexity of modeling.Normally,feature engineering system is a subsystem of machine learning platform and is the basis for training models with machine learning algorithms.This thesis mainly focus on constructing a feature engineering system which is for extracting large-scale structured data.The system is implemented by distributed architecture,deployed by Docker images and Kubernetes systems,for the public security system inside the public security bureau of China.The system contains of three functions of feature engineering,including feature extraction,feature selection and feature generation.The main contributions of this system are as follows:· This system can handle a variety of structured data formats(tsv/csv/xml,etc.)and data types(numeric,label,sequential,character).Meanwhile,it can perform character discretization and binning of continuous features.· Efficiently process massive data and features through distributed extraction and transforming approaches;· Implement filter based feature selection approach to remove redundant features;· Categorical feature coding and math operation methods are provided for chang-ing the state of discrete feature to continuous and generating new features.At present,the system has been deployed inside the machine learning platform,providing stable services within the public security bureau of somewhere,and imple-ment the functions of feature extraction,feature selection and feature generation.
Keywords/Search Tags:Distribution Machine Learning, Feature Engineer, Feature Extraction, Feature Selection, Feature Generation
PDF Full Text Request
Related items