| The era of Big Data has arrived and the volume of data is growing at an unprecedented rate,which poses a huge challenge to massive data mining and knowledge discovery.Information systems are changing in real time as new information continues to arrive.Over such a long period of time,an information system that cannot be stored and calculated by a single computer node is formed.At the same time,inflow of massive data will inevitably have data missing,which will cause the system to become an incomplete information system.The solution of upper and lower approximation sets is an essential and crucial step when using any type of rough set models for data mining tasks such as attribute reduction and rule extraction.Semi-monolayer covering rough set as an extension model of the classical rough set performs well in dealing with set-valued information systems or incomplete information systems.However,there are still problems of inefficient or even impossible computation in dealing with dynamic set-value information systems and large set-value decision information systems.Therefore,the above mentioned problems are investigated in this paper based on semi-monolayer covering rough set.In order to make the semi-monolayer covering rough set adaptable to the set-valued information system that dynamically changes the set of attributes,it is studied in combination with the idea of incremental learning.The corresponding incremental updating methods are proposed for each of the two cases of adding and deleting attribute sets.In addition,the consistency of the results obtained by the incremental method and the static method is demonstrated.The incremental algorithm was developed for updating the approximations based on the idea of incremental methods.Finally,a series of comparative experiments are conducted to verify the effects of dataset size and the ratio of attribute changes on the incremental algorithm.The experimental results show that the computational efficiency of the incremental algorithm is improved by an average of 8.5times and 12 times when adding and removing attributes,respectively,compared with the static algorithm.In order to make semi-monolayer covering rough set adaptable to the large distributed set-valued decision system,it is studied in combination with parallel computing techniques.The calculation methods for each information cell in a distributed environment are given first.Based on the existing granularity semi-monolayer covering rough set model,the whole parallel computation process is divided into two steps: the reliable and the controversial cell parts in the approximation set.Both parts of the above calculations need to be judged based on the corresponding decisions.Since the inaccuracy of the controversial cell’s decisions,the intersection and union of their related reliable cell’s decisions are used as substitution.It is proved in parallelization theory for the above process.Finally,the Spark framework based on in-memory computing is chosen to implement the semi-monolayer covering parallelization theory and evaluate the performance of the algorithm by Speedup,Scaleup and Sizeup metrics.The experimental results show that the larger the data set is,the closer the slopes of Speedup and Sizeup curves are to 1 and the Scaleup curve is closer to 1,which means that the parallelization of the proposed semi-monolayer covering rough set is better.There are altogether 16 figures,16 tables and 96 references in this paper. |