Font Size: a A A

Research And Application Of Some Machine Learning Algorithms

Posted on:2013-01-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:L SunFull Text:PDF
GTID:1118330371482885Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine learning is an important branch of artificial intelligence. It is an inter-disciplinethat involves many fields such as mathematics, statistics, information theory, philosophy etc.Its main objective is to imitate human learning activities by computers so that the computersystems have learning ability. One important direction of machine learning research is theapplication oriented research. Firstly, appropriate representation methods should be designedfor different application problems. Secondly, to address the application problems, theoreticaland empirical analysis should be conducted on the machine learning algorithms. Finally,different machine learning algorithms should be designed for different application problems.This thesis is an exploratory study of machine learning algorithms for different applicationproblems, which is conducted as follows.(1) Theoretical and empirical study of particle swarm optimization algorithms.Optimization is a problem that associates with selecting the optimum solutions from a finiteor infinite set of candidate solutions under predefined constraints. Many optimizationalgorithms, such as genetic algorithm (GA), artificial neural networks (ANN), artificialimmune system (AIS) and ant colony optimization (ACO), have already been applied withsuccess to solve many optimization problems. Among the optimization algorithms, theparticle swarm optimization (PSO) is a swarm based algorithm. It is simple concept, easy inimplementation, and fast in searching. Research on PSO has attracted the attention ofresearchers and many variants have been proposed. Although PSO has become a popularresearch topic and much progress has been made since its introduction, there still exists a bigroom for algorithm to improve its performance through deeper empirical and theoreticalstudies. As far as PSO design, there are still some issues unanswered, if they are suitablyaddressed, can lead to more robust and efficient PSOs.The first part of this thesis is conducted with the following aspects. In the theoreticalaspect, firstly, we proposed and proved that the PSO can be modeled using an absorbingMarkov sequence. Secondly, we studied a single particle's transfer probability in order toobtain a better understand of PSO. From the study, we found that the probability of a singleparticle"belonging"to the optimal region is associated with two probabilities. The first is theprobability that the particle falls into optimal region if its support set covers the optimalregion, the other is the probability that the optimal region"belonging"to the support set of theparticle. Generally, the larger support set a particle has, the higher probability that the optimalregion"belonging"to the support set of the particle. Based on the general convergence proofsfor random search algorithms of Solis and Wets, we provide to sufficient conditions for PSO converge to the global optimal region with probability one. The first sufficient condition is tomake the support set of the whole swarm covering the entire solution space during the PSOexecution process. The second is to make the support set of the whole swarm covers the areathat is"closer"to the optimal region during the PSO execution process.In the empirical and application aspect, we proposed a stochastic PSO model. Dependingon the properties of the stochastic region, we derived two stochastic PSO algorithms, i.e.,StPSO-C and StPSO-G. In StPSO-C, the stochastic region is described by Cauchy distribution.The StPSO-C satisfies the first sufficient condition. In StPSO-G, the stochastic region isdescribed by Gaussian distribution. The StPSO-G satisfies the second sufficient condition.Thus both algorithms can converge to the optimal solution. To solve the high dimensionaloptimization problems, a cooperative particle swarm optimization algorithm was proposed.The performance of the proposed algorithms is tested on fifteen moderate dimensionsbenchmark functions and two large scale benchmark functions. The simulated results indicatethat both algorithm can converge to the optimal solution with probability one. From thecomparison results, it can be seen that the StPSO-C has better global search ability, and theStPSO-G has faster converge speed. Thus it can be hypothesized that it is better to applyStPSO-C when the quality of the final result is more important, to apply StPSO-G when theconvergence speed is of more important, to apply cooperative PSO while solving large scaleoptimization problems.(2) Classification of power quality disturbances using GA and SVM. Power quality is aset of boundaries that allows electrical system to function in their intended manner withoutloss of performance. It is very important for power utilities and its customers. In the realworld power system, when the power quality disturbance occurs, the disturbance waveformsshould be detected and classified automatically in a short time so that appropriate mitigatingaction can be taken. Therefore, how to develop efficient methods for automated detection andclassification of power quality disturbances has attracted the attention from power engineeringcommunity. One of the issues in constructing classifier is the selection of features. Irrelevantor noisy features can increase the complexity of the classification problem and thus bediscarded by the feature selection process.The second part of this thesis is conducted with the following aspects. To extract featuresof power signals on the monitor, the wavelet transform is adopted. To eliminate the irrelevantor noisy features in the feature vectors, we propose to use genetic algorithm (GA). In specific,a chromosome representation method which designates the feature selection scheme and aBhattacharyya distance based objection function which evaluates the performance of thefeature selection scheme are designed. Following this, the GA applies the evolutionarysearching mechanism to find the best select scheme to select the most discriminate features. Inorder to classify the refined features, we adopt a SVM approach. The performance of theproposed algorithm is evaluated on five power quality disturbance signals which occurfrequently in the power systems. The simulated results indicated that by applying the GA based feature selection approach, the number of features provided to SVM was reduced from392 to 27. And the classification accuracy of the algorithm didn't deteriorate much.(3) Support vector description of clusters for content based image annotation. Theprevalent image search engines such as Google and Yahoo! rely mainly on the textualdescriptions of images containing in the filenames or the keywords. They don't consider thecontent of images and cannot search images without annotations. This creates the need forcontent based image retrieval (CBIR). Most of the CBIR systems still suffer from the"semantic gap problem". One natural way to mitigate the"semantic gap problem"is to assigntags onto images. Methods that use the low level features such as color, texture, and shape toperform annotations are called content based image annotation (CBIA). To fill the gapbetween the low-level image visual contents and the high level semantics, substantial machinelearning techniques are required. Among the machine learning techniques, the support vectorcluster (SVC) is a recently developed algorithm inspired by the support vector machine. TheSVC has many advantages over other algorithms for its ability to delineate cluster boundariesof irregular shapes, and to deal with outliers by employing a soft margin constant. In the realworld image systems, the images are organized irregularly. Since the SVC exhibits its abilityto delineate cluster boundaries of irregular shapes, it will provide opportunities to developunified models to describe the irregularly organized images.The second part of this thesis is conducted with the following aspects. A support vectorbased algorithm is proposed for image automatic annotation. Images are represented by colorand texture features. The algorithm has two major components, the training process and theannotating process. In the training process, clusters of images are used as training instances.Each cluster is manually annotated by a set of semantic words and subject to a concept. Foreach cluster, two one-cluster SVC models are trained based on the color signatures and thetexture signatures, respectively. In the annotating process, for a test image, the probability ofthis instance being generated by each one-cluster SVC model is computed. To annotate theimage, relevant words are selected based the fusion of these probabilities. The performance ofthe proposed algorithm is tested on Corel60K benchmark. The simulated results indicate thatthe proposed algorithm obtains a higher accuracy on different recall levels when comparingwith other algorithms.
Keywords/Search Tags:Particle swarm optimization, power quality disturbance, genetic algorithm, supportector machine, content based image annotation, support vector clustering
PDF Full Text Request
Related items