Font Size: a A A

Feature Selection On Preference Database And CP-nets Structure Learning

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2428330623474899Subject:Engineering
Abstract/Summary:PDF Full Text Request
The learning of user's preferences,such as Conditional Preference networks?CP-nets?,has become a core issue in artificial intelligence research.Most of the current research is structural learning of CP-nets from randomly selected examples or peer-to-peer queries.In order to evaluate the optimal performance of the learning algorithm and better learn the CP-nets structure,this paper regards the user's preferences in the preference database as research object,and uses mRMR algorithm in feature selection to obtain CP-nets structure through learning.Furthermore,the mRMCR algorithm is adopted to calculate the minimum public redundancy and avoid the calculation of conditional mutual information,thus obtaining the acyclic CP-nets structure.Finally,based on the MapReduce framework,the information theory method is used to learn CP-nets structure from large-scale data.The main works of this paper are as follows:Firstly,the CP-nets structure that can be removed based on the mRMR algo-rithm in feature selection is obtained in this paper.The method of solving mutual information and conditional mutual information on preference database is established by minimal Redundancy Maximal Relevance?mRMR?algorithm,and the mutual in-formation is regarded as the correlation between an attribute and its feasible father.Conditional mutual information is regarded as redundancy between the feasible fa-ther's concentrated attributes.Thus,the objective function of minimal redundancy maximal relevance is constructed.It is pointed out that the parent set of an attribute is a subset of attributes which have little redundancy between attributes and have a great influence on the preference of children's attributes preference.The effectiveness of the algorithm is verified on the film recommendation data set.The research results show that the method of feature selection based on mRMR can effectively obtain the causal relationship between variables,thus extracting the father set of each attribute,further obtaining the structure of CP-nets.Secondly,the structure of acyclic CP-nets is obtained by mRMCR algorithm.There are many traditional CP-nets structure learning methods,but it is not acyclic CP-nets.In order to make our graphical model include relevant,exclude irrelevant and control the use of redundant features,the maximal relevance minimal common redundancy?mRMCR?algorithm is designed.The algorithm avoids the calculation of conditional mutual information,makes the correlation and redundancy comparable,effectively measures the degree of dependence between variables,determines the causal relationship between variables,and then learns the acyclic structure of CP-nets.The experimental results show that compared with other algorithms,the mRMCR algo-rithm can learn the optimal acyclic CP-nets.Finally,in order to solve the data storage problem and the weak scalability of the algorithm in large-scale data environment,a parallel algorithm based on the MapReduce framework is proposed-Parallel Kullback-Leibler divergence combining GINI index?P-KLIC?algorithm.In this algorithm,GINI index and information gain ideas are integrated to construct a scoring function,and topological structure N of CP-nets is obtained in parallel from the preference database.In addition,an informa-tion gain indicator incorporating the GINI index is used to qualitatively represent the amount of related information between CP-nets structure attributes.A score function on the preference database is established by the score and search idea,and the candi-date father structure is scored and searched in parallel.Then,based on the line-order space search,the local optimum of each node is obtained to obtain the global optimum.The experimental results show that the P-KLIC algorithm is feasible and scalable in achieving parallel learning of CP-nets structure under large-scale data.In summary,combining the above three parts of learning content,a method for learning CP-nets structure using feature selection from the preference database is obtained,so that the obtained CP-nets N and the original model N0maintain a high similarity,learning the obtained model is more reliable.
Keywords/Search Tags:CP-nets structure learning, preference database, feature selection, parallelization, feasible father
PDF Full Text Request
Related items