Font Size: a A A

A Research Of Feature Analysis Based On Statistics And Big Data

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:S XuFull Text:PDF
GTID:2348330542998820Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The internet yielding the accumulation of big data,the upgrades of internet services rely on high-efficiency of data processing.Machine learning is the best way to address it.However,to date machine learning models are highly dependent upon relevant data features.Therefore,feature analysis is the key point to address the issue of lifting the generalization ability of machine learning models against the context of big data.The theory of statistical feature analysis contains at least two aspects,one feature preprocessing,and second,feature selection.In terms of information theory,the process of feature preprocessing increases the reliability of information by augmentation of dimension,while the process of feature selection increases the efficiency of information by dimension reduction.The significance of researching the field of feature analysis at least lies in three aspects:first,reducing the cost of machine learning to improve the quality of internet services;second,addressing livelihood issues such as the difficulty to seek adequate medical care;third,taking the advantage of the laws of human behavior to develop sustainably.Four aspects are studied in this paper.In the way of feature preprocessing,this paper researches cutting-edge technics and algorithms of seven fields:feature capturing,feature transforming,outlier detection,missing value processing,time series processing,spatial data processing and imbalance data processing,which is practical for data processors and data managers.In the way of feature selection,this paper profoundly explores computational complexity,use cases,merits and demerits of four solutions:relevance based filters,Lasso based sparse selection,ensemble models and neural networks,which is useful for data mining workers.Except for the theoretical irnvestigations,this paper practically utilizes the technology of feature analysis in the prediction of diseases and the prediction of human actions.Considering the issue of predicting chronic kidney diseases,this paper elevates the recall of predicting up to 99.63 percent by the superiorities of mean imputation,dummy encoding,feature sorting and model stacking,using the digital records of urine tests and physical checkups.The results of cross validation in the experiment prove the feasibility of predicting disease using the methods of feature analysis.Considering the issue of predicting human activities,this paper researches multiple factors to do with predicting the activities of Github users,the result of which discovers that the window of history data requirement is two month,finds out that predictability is linear correlate to the entropy of activation,and proves the standpoint that the predictability is 93 percent on average.
Keywords/Search Tags:feature selection, big data, machine learning, predictability
PDF Full Text Request
Related items