Big Data Analysis And Excessive Data Mining Risks:A Study On Assessment Of All The Resultant Economic Value Concerned

Posted on:2018-04-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L Liu

Full Text:PDF

GTID:1318330542966900

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Information is the foundation of correct decision.Its quality,quantity and the technical processing directly affect the information function as the cornerstone of decision-making.In the information age,the internet technology which raised the productivity developed fast.More and more information can be recorded,stored and passed.Once the information can be recorded,stored and passed over,it becomes the data in the modern sense.Now,an extensive,variable,transferring timely,various information age is coming.It is called "big data era".The big data era has changed the traditional methods of data analysis,thinking and paradigm.It provides a new thinking feast and a golden opportunity to the statistics personnel to realize their ambition.It reflects in:(1)On the one hand,the data range is extended.The sample data has been replaced by the overall data.On the other hand,the data type is extended from the structured data to the semi-structured data and the unstructured data.The traditional sophisticated data analysis and processing techniques are applied to the structured data.Facing the semi-structured and unstructured data,they can’t function well.How to deal with this problem?Can we turn the semi-structured and unstructured data to the structured data?Or create new methods and techniques to treat the semi-structured and unstructured data specially?This not only needs new methods and techniques,but also needs new thinking and philosophy!(2)On the one hand,when facing the massive data,especially those without the consistency and uniform structure,how we can make them suitable for the statistical research paradigms?Again,the statistical analysis of the data stream is a new topic.Let the statistics science be flexible! There are no appropriate methods which can be used.It needs a new statistical tool.(3)Big data industrializing or statistical product marketing will change the statistical dependent status in reality completely.Statistics personnel’s creation value will become more direct and explicit.To realize this goal,we,the statistical workers not only need new statistical thinking but also need our confidence and hard work.However,at the big data age,we must also recognize that,we should pay attention to the problem of less efficient data mining analysis,at the same time,we can’t ignore the problems arising from excessive data mining.Big data concept was introduced in 2012.It drew great attention of the whole society around the world.Many scholars and practitioners are engaged in the field of research and product development.There are many research results about these.But,big data analysis and big data industrializing are new fields which need to be studied further and be filled in a lot of blank in them.This paper summarizes the basic concept and development status of big data firstly.Many information science,information economics and technology analysis methods are used in the paper.Through the definition of the concept of big data,from the statistical perspective,it explains the relationship between the information and big data,characteristic of big data,the characteristics and challenges of the big data times,and the differences,linkages and impacts between the traditional statistics and big data analysis.Next,it discusses the value creation process of big data analysis and measurement,the technology risk,model risk and decision risk in the big data analysis.Those risks involve three aspects:the data security and conversion,the model specification and the human factors.Later,on the basis of analysis of risk factors,many risk prevention measures in the process of big data analysis are presented.These measures are based on the risk management theory.Finally,there is a case in the subprime crisis.Using the historical analysis method it shows the analysis and risk of the big data in the credit rating process.In the paper,there are seven chapters:Introduction;Information,structured dataand big data;The economic value of big data;The data mining risks in the context of big data;The risk prevention measures in the excessive data mining;A case analysis:the creditrating in the subprime crisis;Conclusion and Outlook.Major research includes:1,The economic value of big data and the excessive data mining risk.At the big data times,the nature of public product or associate public product to data becomes more and more obviously.In the full argument of data analysis and the external characteristics in the using process,it demonstrates that data information value contains private value and social value.No matter what the value is,it is all decided by the depth and width of the data analysis.Once the data analysis methods are abused or the statistical conclusion is considered to be the final one,there is the excessive data mining risk.2,The causes of the data excessive mining.Data excessive mining will challenge the information security which contains two parts:firstly,if people reveal the real condition which is objective to the basic social rules by using the more advanced techniques and models,it will affect the normal society operating.But what it reveals is real.Secondly,the information which is not consistent with the fact is named noise information.The action of revealing the noise information belongs to the excessive data mining also.Those contain two levels:creating the subjective noise information and creating the non-subjective noise information.There are various reasons and motives after the different excessive data mining.The analysis of those reasons and motives is the base of the risk prevention to the excessive data mining.3,The complication in the models and techniques analysis and the excessive data mining risk.Advocating science is the basic concept in human society.But in fact,this advocating has been evolved in the pursuit to the complicated models and techniques.The complicated models and techniques need a higher level of abstraction.This abstraction is easy to cause a lot of problems.For example:being objective to the truth,choosing the unrepresentative sample data,model errors and so on.To some extent,this technique analysis will lead to the excessive data mining risk which has been turned into the creation of the noise information.4,The opportunistic motives and the excessive data mining risk.The information market is an incomplete market.This creates an opportunity to obtain profits for the providers with good reputations.In the specific context,the opportunistic motives can be slip into the moral hazard.Especially when some people want to dig out information to serve their own purpose,they will use some specialized analytical techniques and models.At this time,the moral hazard has been turned into the excessive data mining risk.5,The excessive data mining and the decision risk.In the existing hierarchy arrangements,the analysts and the policy makers are different often.They not only have different interests,but also have different knowledge.Driven by obtaining the maximum interest,the rational analysts would like to pursue the complication in the models and techniques.No matter how the analysts use the complicated models to prove the policy makers’ thinking,or,the analysts do something differently,there are excessive data mining.Once the policy makers make decisions according to this analysis,the excessive data mining risk has been turned into the decision risk.The paper draws the following conclusions based upon the above analysis:Firstly,the content of the big data is information,but the big data has been given more changing meaning.The big data is a whole process,facing all types of data,which contains collecting,processing,converting,storing,transmitting,analyzing,algorithms-building,applying,and the products providing in terms of data and their processed follow-ups.This whole not only changes the traditional data analysis,but also changes our work and life.The big data have the whole properties of information.The data value is equal to the information value.Statistics and big data science have the relevance in the technology and the resonance in the mind.Big data science has been more powerful than statistics in exploring the socio-economic phenomena and laws.From the technical point of view,big data science is rooted in the information science.So the key to applying the big data very well is the improvement and advancement in the information science and technology,but is not dependent on statistics solely.Secondly,big data is the whole process of the information resource development,transfer and application.As a result,big data’s economic value is the added value of the whole process,from the information development to the application stage.Many of the unique characteristics of information increase the difficulty in the information value assessment.If the information value is classified into special value and general value,it is possible to be evaluated and metricated.In terms of special information value evaluation,which is private evaluation,is suitable to a particular specific decision.This evaluation has not covered the whole information value.It is the part realization of the information value to a special decision making project only.For this,the general information value evaluation is offered.In this evaluation,information value is constituted by two parts:the private value and the next external economic value.In addition,through the data combination and decomposition,it is easy to find the potential relations among the complicated data,and help to explore the laws and realize the value better.Thirdly,the key to the data analysis is finding new information in the complicated data,enhancing man’s understanding of the things,and making reasonable decisions.Big data make people face significant growth of available information,but there are uncertainties and risks still.To data analysis,there are two kinds of risks:risk from excessive data mining and risk from lack efficiency of data mining.The lack of efficiency data mining means that you cannot find valuable information from the data,or you cannot find relations and laws in the data.Excessive data mining means that you may find the wrong or false information,or that wrong/false information is to be used by some people with evil intentions.The lack efficiency of data mining leads to losing the opportunity,while the excessive data mining leads to the direct loss from the misjudgment.There are many causes to the excessive data mining,but "how to covert the unstructured data to the structured data equivalently" is the root of the various risks’ mentioned above.Finally,after the subprime crisis,people have paid more attention to the rating agencies’faults,and thought the objectivity and impartiality of the credit rating.After summarizing the rating agencies’ specific work,people find:it seems that the rating methodologies and procedures are logical,the quantitative analysis are accurate.But in fact,a lot of unstructured data are used in the process of rating.So the specific rating factors,the weights and other important information are unable to be known.As a result,the entire rating process lacked transparency and objectivity,and the rating results lacked credibility.This is a typical case of excessive unstructured data mining.

Keywords/Search Tags:

Big Data, Statistics, Information Value, Data Type, Data Analysis, Excessive Data Mining Risk

PDF Full Text Request

Related items

1	The Financial Risk Analysis And Forewarning Research Based On Data Mining Technology
2	Study On Application Of Data Warehouse Technique In Disease Case Statistics And Analysis
3	Data Starage Design And Data Mining
4	Research On Bank Risk Monitoring System Based On Large Data Mining And Data Visualization Technology
5	Constructing Chuxiong Customer Analysis Model Based On Data Mining
6	Based On The Data Warehouse Qinghai Statistics Decision Analysis Support System Design And Implementation
7	Research On Application Of Statistics Based Data Mining In Customer Relationship Management
8	Research On Data Analysis And Mining Method For Big Data In Power Dispatching And Control System
9	A NOSOL-based Information Management And Analysis System For Remote Sensing Data
10	Research On Data Cleaning And Model Evaluation Based On Data Mining