Font Size: a A A

Optimal Subset Of Distributed Estimation

Posted on:2022-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2507306554451774Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the term big data has been mentioned more and more.large amount of data and various types are two important characteristics of the era of big data.In this case,the storage and analysis of data can no longer be carried out on a single machine.Distributed processing has attracted more and more attention due to its advantages such as parallelism,robustness and flexibility.In the statistical field,various statistical analysis activities of divide and conquer algorithms have been introduced.This paper studies two problems of distributed processing: the optimal subset of distributed interval estimation and the optimal subset of distributed hypothesis testing.The purpose is to realize machine learning and statistical inference of large-scale data.A statistical model is established,and R software is used for simulation experiment research,and the estimation performance of the proposed distributed estimation algorithm is verified by analyzing the data.The first chapter is the introduction,which mainly introduces the source and significance of the topic,the current research situation at home and abroad,the research work of this article,and the research motivation and innovation points.The second chapter is preliminary knowledge.First,comb the distributed theory and briefly describe the process of distributed estimation.Next,introduce the interval estimation problem of the distributed linear regression model,and introduce two distributed estimation algorithms: one-step average estimation and one-step median estimation.Finally,introduce the hypothesis testing related theory of distributed linear regression model.The third chapter introduces the optimal subset problem of research interval estimation,proposes the LIC criterion for the optimal subset selection of distributed interval estimation,and introduces two related properties to ensure the feasibility of the algorithm;conduct simulation and real data analysis,compare the estimation performance of five algorithms to verify the effectiveness of the proposed method.The forth chapter introduces the optimal subset problem of distributed hypothesis testing,and the PPC criterion for optimal subset selection of distributed hypothesis testing is proposed.The simulation and real data analysis are carried out.At the same time,the interval estimation algorithm is applied to this problem,and the estimation performance of nine algorithms is compared to verify the effectiveness of the intersection method.The fifth chapter is the research conclusions and prospects.It summarizes the research conclusions of this article and proposes unresolved problems as an attempt for future research work.
Keywords/Search Tags:distributed estimation, distributed regression, optimal subset, distributed data, distributed hypothesis testing
PDF Full Text Request
Related items