Study On The Interpretability In Classification Problem Of Educational Data Mining

Posted on:2018-12-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Mu

Full Text:PDF

GTID:1368330566455739

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Educational Data Mining(EDM)adopts machine learning and other techniques to solve problems in educational research and practice,involving many branches such as computer science,education,and psychology and etc.It demands a high quality of interpretabiliy of machine learning models in that the research results mainly service teachers,students and education decision makers.On the one hand,if a model is hard to understand,EDM users which are generally not information technology professionals may refuse to use it.On the other hand,education decision makers are usually responsible for their decision makings.Hence,they may be unwilling to use the model for their decision making unless the model can offer reasonable evidence.In the past decades,the interpretability of machine learning model gets much attention and gains many achievements as well.Nevertheless,some issues retain such as lack of unified standards for interpretability on different models,less consideration of human cognitive factors and etc.The research about interpretability in EDM even in the data mining has many shortcomings.Especially,present studies mainly focus on the issue of interpretability in the model building stage ignoring the interpretability research of other stages in the data mining life cycle.In view of the above problems,with abundant research,this paper presents an interpretive research scheme that involves the whole life cycle of data mining.Seeing that classification models are most widely used in EDM and even data mining,this paper which regards the issue of classification includes five aspects:(1)This paper systematically studies interpretation issues in data mining: after summarizing the objects and objectives of the interpretation research at each stage of the data mining life cycle,this paper proposes a scheme involving six stages of data mining and focuses on interpretation issues of four main stages including data understanding,model building,testing and evaluation.Especially,based on the classification problem in EDM,this paper studies the interpretability of EDM by using the proposed scheme.(2)This paper proposes a process for improving interpretability of raw data set: in the stage of data understanding,the essence of improving interpretability is to improve the inpterpretability of raw data set.So the process involoves many methods is proposed to enhance insight into data.This process contains dimensionality reduction,visualization,clustering analysis,Markov model,feature selection and many other methods.In particular,two feature selection methods based on differences between features are proposed to help people quickly capture important information in the data sets.(3)This paper proposes a two-stage data preparation method: in data preparation stage,the essence of improving interpretability is to improve the interpretability of data set to be modeled.Usually raw data in EDM classification issues is unbalanced,but present feature selection algorithms do not take this feature into account.Therefore,a two-stage data preparation(TSDP)method is proposed to solve the issue of feature selection in the case of data unbalance and to construct a data set to be modeled with prediction of high accuracy and good understandability.(4)This paper proposes a method for explaining classification model SVM: combining the theory of cognitive psychology,one framework is proposed to study the interpretability of the black box model.Under the framework,based on the exemplar theory and the availability heuristic of cognitive psychology,one method of interpreting SVM classification model is presented.This method simulates the process of human cognition,so the interpretability result is easy to be accepted.The experimental results also show that this method is more stable and accurate than other black box interpretation algorithms.(5)This paper proposes a research framework for cross-model evaluation of interpretability: a new idea is proposed to compare the interpretability of different models,that is,introducing machine learning algorithm to evaluate the interpretability of the model.Firstly,different types of models are transformed into graphs and several features are extracted.Then data related to model interpretability are collected through experiments.Finally,machine learning algorithm is used to train the evaluation model so as to compare the interpretability of cross-model.Results of the experiments show that the evaluation model can accurately assess model interpretability with strong ability of generalization.The study of interpretability in this paper involves many stages of the Data Mining life cycle,which remedies the shortcomings of existing research.Although research object is limited to EDM,many methods could be available to other fields.The results in this paper can provide valuable clues for research of Data Mining and education.

Keywords/Search Tags:

Educational Data Mining, Interpretability Research, Classification, Process for Data Mining, Cognitive Psychology

PDF Full Text Request

Related items

1	Study On Interpretability In Data Mining Process
2	A PROV-based Process Analysis Method For Improving Interpretability Of Data Mining Results
3	Research On Computer Educational Data Mining Based On Cognitive Diagnosis Methods
4	The Application And Studying Of Data Mining In Educational Administration System Of Middle School
5	Recognition And Research Of Students With Abnormal Psychology Based On Educational Big Data
6	The Application Of Data Mining In Educational Administration System Of Digital Campus
7	Data Mining Technology Applied Research In The Educational Management System
8	Application Research Of Educational Data Mining Technology In University Computer Foundation
9	Study On Several Typical Data Mining Methods And Their Applications
10	Data mining via support vector machines: Scalability, applicability, and interpretability