Font Size: a A A

Granulation Modeling Approaches And Its Applications For Multi-feature Integration

Posted on:2018-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:S P XuFull Text:PDF
GTID:2348330536477618Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology,especially the widespread adoption of new services in recent years,such as cloud computing,internet of things and social networks,the scale of the data generated by human society is growing at an unprecedented rate.From the large and complex data,with application requirements as the goal,mining effective information has become the main driver for the development of modern science and technology.However,modern data has typical characteristics,such as the diversity of descriptions,the universality of sources,the complexity of structures,and the speediness of growths,to explore how to get deep information and hidden knowledge from the huge and complex heterogeneous data,compared with the traditional data mining and knowledge discovery,has been a longterm and arduous task.Nowadays,‘Divide and Rule' is a widely accepted strategy in many large-scale and complex data processing approaches,i.e.,firstly adopting appropriate sampling and hierarchical techniques splits the complex data in a reasonable way,and then establishes an efficient learning mechanism to deal with the corresponding part of the data.The above strategy reflects the multi-granularity cognitive ability of human beings when faced with complex problems.Nevertheless,the diversity of granularities will bring some rigorous challenges to the study of multi-granularity,such as high computational complexity,insufficient grasping of the characteristics of specified targets,lacking enough abilities of fusion and dynamic learning,etc.From these points of view,this dissertation aims to explore efficient multi-granularity based information granulation techniques and then develop new multi-granularity based modeling and knowledge acquisition approaches from the following three aspects: information granulation in labels induced multi-feature space,classifier design in parameters induced multi-feature space,fusion learning approaches in multi-feature space.Particularly,the main contributions of this dissertation and our innovations are as follows:(1)Feature space transformation strategy and rough data analysis in single-label classification.It is well-known that most of studies about rough set are based on the original feature space,and they do not take into consideration the distinct characteristics which come from samples in different classes,however,these characteristics may contribute much to the accuracy of rules.To solve the above problem,we have proposed a multi-feature space transformation strategy,which can reflect the characteristics of different decision classes.Furthermore,the definitions of approximation quality and conditional entropy of decision system in multi-feature space have been also introduced.Experimental results show the effectiveness of the proposed transformation strategy for reducing the uncertainty of decision system and improving classification performance.(2)Feature space transformation strategy and rough data analysis in multi-label classification.Since different labels may have distinct characteristics of their own,constructing the label-specific feature space is necessary for multi-label learning.However,the construction of label-specific feature space may lead to the increase of dimension for feature space,and a large amount of redundant information exists in feature space.To alleviate this problem,with the idea of approximate reduction based on fuzzy rough sets,we have developed two multilabel learning approaches with dimension reduction of label-specific feature space,i.e.,FRS-LIFT and FRS-SS-LIFT.Especially,on the basis of FRS-LIFT,FRS-SS-LIFT effectively reduces the time consumption of dimension reduction by sample selection.Experimental results validate the feasibility and efficiency of the proposed approaches for improving the predictive performance of multilabel learning system.(3)Collaborative classification approaches in the parameterized feature spaces.In neighborhood rough sets,with the increase of the size of information granules,the majority voting rule based neighborhood classifier is easy to misjudge the classes of unknown samples.To remedy the above deficiency,with the idea of collaborative representation,we have presented a neighborhood collaborative classifier,i.e.,NCC.NCC determines the class of an unknown sample with collaborative representation in the neighborhood space,and it considers the class with the minimal reconstruction error for the unknown sample as the predicted category.Experimental results do not only validate effectiveness of the proposed approach for improving the classification performance of neighborhood classifier in a larger information granule,but also show that NCC is an effective means for reducing the time consumption of traditional CRC.(4)Prediction approaches for protein structural classes via multi-feature space fusion strategy.Our focus is on the prediction of protein secondary structure classes in bioinformatics.We have extracted the features of protein sequences from the two perspectives of Pse AAC and Pse PSSM,and then fused them in a serial way.By the target of k-nearest neighbor error rate minimizing,we have presented an approach for the prediction of protein secondary structure classes based on the decrease of k-nearest neighbor error rate with using the heuristic search strategy.Experimental results demonstrate the effectiveness of the proposed approach for improving the prediction accuracy of protein secondary structure classes.
Keywords/Search Tags:Multi-granularity, Information Granulation, Feature Integration, Rough Data Analysis, Multi-label Learning, Collaborative Representation, Protein Structure Prediction
PDF Full Text Request
Related items