Font Size: a A A

A Reseach For Neighborhood Hypergraph Based Classification Algorithm For Incomplete Information System

Posted on:2017-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:J ShiFull Text:PDF
GTID:2348330533450150Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The classification problem of incomplete information system is a hot issue in intelligent information processing. Due to the existence of missing value in incomplete information system, it is ineffective by using traditional machine learning method to deal with. In order to prove the accuracy of incomplete information system, many solutions have been proposed from different perspective.(1)Turn the incomplete information system into a complete information system, then classify the system by using the method which is used for complete information system.(2)Classify the incomplete information system directly. Among these different data analysis theories and methods, rough sets are the most frequently used. There are some extension models in rough set to deal with incomplete information system, such as tolerance relation, limited tolerance relation, non-symmetric similarity relation, etc.Hypergraph is a new intelligent method for machine learning, However, it is hard to process the incomplete information system by the traditional hypergraph, which dues to two reasons:(1) The conventional hypergraph can only deal with the discrete data, and it still need to discretize the continuous data.(2) The existing methods are unsuitable to deal with incomplete information system for the sake of missing values in incomplete information system. To improve the problems mentioned above, we introduce the neighborhood rough set. The core concepts of rough set theory are approximations. Using the concepts of lower and upper approximations, knowledge hidden in information systems may be discovered and expressed in the form of decision rules. In other words, certain rules can be induced directly from the lower approximation, and possible rules can be derived from the upper approximation. So the study of approximation space has been developed widely. Once we apply rough set theory into hypergraph, we can supervise the hyper-edge replacement process and improve the generalization ability of hyper-edges as well. The neighborhood-based rough set can solve the problem that classic rough set theory can not deal with the continuous data. So in this paper, we employ hypergraph model and rough set theory to build a neighborhood hypergraph model. After that, we propose a classification algorithm for incomplete information systems based on neighborhood hypergraph. This algorithm is composed of the following three steps. Firstly, we initialize the hypergraph. Second, we classify the training set by neighborhood hypergraph. Third, under the guidance of rough set, we replace the poor hyper-edges. The second and third steps are iterative process, which will stop as far as the accuracy reaches a specific threshold. After that, we can obtain a good classifier. However, the running time is extremely long when we apply the algorithm on large data set. So we parallel the original algorithm by combining Spark programming paradigm. Spark is more suitable than other parallel frameworks when it comes to the iterative machine learning algorithm. By paralleling the method we proposed, it decreases the time complex and improves the efficiency on large data set.The neighborhood hypergraph based classification algorithm for incomplete information system is tested on 15 data sets from UCI machine learning repository. Furthermore, it is compared with some existing methods, such as C4.5, SVM, Navie Bayes, KNN, etc. The experimental results show that the proposed algorithm has better performance via Precision, Recall, AUC and F-measure. Moreover, we also proved that by paralleling the algorithm in this paper, it can not only reduce the execute time but also keep its validity.
Keywords/Search Tags:Neighborhood Rough set, Hypergraph, Incomplete information system Classification, Parallel, Spark
PDF Full Text Request
Related items