Font Size: a A A

Applications Research Of Supervised Intelligent Clustering And Classification Technologies

Posted on:2018-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:W L HangFull Text:PDF
GTID:1318330518975310Subject:Light Industry Information Technology
Abstract/Summary:PDF Full Text Request
In recent decades,machine learning technology has already achieved remarkable success in many knowledge engineering fields,such as clustering,classification and regression.Among them,classification technology and clustering technology are two important research topics in machine learning,and are widely used in text classification,semantic analysis,image recognition and other real-world applications.However,with the prevalent of multimedia technologies,more and more newly-emerged applications are discovered.Compared with the traditional application scenarios,newly-emerged scenarios often have the following problems: the collected data samples or the labeled data samples are usually very scarce due to the high confidentiality of the production process or the low yield caused by the high-cost industry.Thus,there is often few available data for traditional learning methods.Classical classification and clustering techniques were often faced with the following challenges when dealing with above problems: due to the lack of available data,classical classification and clustering techniques tend to have poor generalization performance when modeling or processing such insufficient data.Therefore,this topic mainly focuses on the problem of insufficient data and lack of labeled data in the newly-emerged scenarios.Researching and improving the classical classification and clustering technologies to obtain the intelligent classification and clustering methods so as to solve the above problems.To solve the problems that the classical classification and clustering methods cannot effectively deal with the above newly-emerged application scenarios,this study mainly focus on improving the generalization capacity of classical machine learning methods to fit for the newly-emerged applications from classificaton method and clustering method.The detials of this study are as follows:(1)Section 2 to section 3 is the first part,which studies the intelligent supervised clustering technology and its applications.Firstly,to solve the problem that most of current clustering methods not only need pre-set the number of clusters or other user-specific parameters but also carry out on the large datasets inefficiently,we study the clustering problem by exploring the metaphor of gravitational kinematics based on Central Force Optimization(CFO).However,different from the global synchronization of CFO,we proposed a new algorithm G-Sync by simulating the partial synchronization phenomenon.By introducing the Davies-Bouldin Index(DBI),G-Sync can determine clusters of arbitrary size,shape and density.Moreover,pre-setting the number of clusters to be found is not required.The algorithm is further extended for handling large dataset with the scalable S-G-Sync algorithm which is based on fast kernel density estimation(FastKDE).Secondly,in section 3,we develop new cluster algorithms which can leverage useful information in the source domain to guide the clustering performance in the case that most traditional clustering methods cannot effectively deal with the insufficient datasets in target domain.With the similar distribution of source and target domains,a clustering algorithm called transfer affinity propagation(TAP)is proposed for the insufficient dataset scenarios which can obtain appropriate number of clusters and high quality clustering result.The basic idea of TAP is to modify the update rules about two message propagations,used in affinity propagation(AP),through leveraging statistical property and geometric structure together.With the corresponding factor graph,TAP indeed can be applied to cluster in AP-like transfer learning,i.e.,TAP can abstract the knowledge of source domains through the two tricks to enhance the learning of target domain even if the data in the current scene are not adequate.(2)Section 4 to section 5 is the second part,which mainly studies the intelligent supervised classification technology and its applications.Firstly,in 3 section,the Semi-supervised learning methods are conventionally conducted by simultaneously utilizing abundant unlabeled samples and a few labeled samples given.However,the unlabeled samples are usually adopted with assumptions,e.g.,cluster and manifold assumptions,which degrade the performance when the assumptions become invalid.The reliable hidden features embedded in both the labeled and the unlabeled samples were proposed to tackle this issue.By introducing an orthonormal projection matrix,we first transform both the unlabeled and labeled samples into a shared hidden subspace to determine the connections between the samples,and utilize the hidden features,the raw features,and zero vectors determined to develop a novel feature augmentation strategy.The proposed method takes into account the correlation between labeled data and unlabeled data samples,so that the generalization performance of the classifier can be improved significantly.Secondly,in 4 section,in orde to improve the generalization capability for the classification model with insufficient availiable sample,a novel selective transfer classification learning method(CSTL)on the basis of classification-error-based consensus regularization(CCR)was proposed.Traditional transfer learning methods are conducted by utilizing abundant labeled data in the source domain to build an accurate classifier for the target domain with scarce labeled data.However,most current transfer learning methods assume that all the source data are relevant to target domain,which may induce negative learning effect when the assumption becomes invalid as in many practical scenarios.To tackle this issue,the key is to identify the correlated source data and the corresponding weights.By keeping the consistency between the distributions of the classification errors of both the source and target domains,we first propose the classification-error-based consensus regularization(CCR),which can guarantee the performance improvement of the target classifier.Based on this approach,a novel CCR-based selective transfer classification learning method(CSTL)is then developed to autonomously and quickly choose the correlated source data and the weights to exploit the transferred knowledge by minimizing the leave-one-out cross-validation error despite scarce target training data.The advantages of the CSTL are demonstrated through a serious of experiments.
Keywords/Search Tags:Insufficient data, Central force optimization, Affinity propagation, Factor graph, Transfer learning, Semi-supervised learning, Negtive transfer, Classification-error-based consensus regularization
PDF Full Text Request
Related items