| As an important and representative area of artificial intelligence,machine learning focuses on building algorithmic models that can identify patterns and relationships in data.It can be divided into supervised learning,semisupervised learning,unsupervised learning,and reinforcement learning.In recent years,reinforcement learning has become one of the hottest fields.It is also the key technology for artificial intelligence to surpass human-level.Reinforcement learning has been applied to many problems,such as automatic driving,computer game,and natural language processing.The improvement of reinforcement learning algorithms not only directly affects people’s lives but also has a farreaching impact on the development of artificial intelligence in the future.While,because unsupervised learning can save the high cost of labelling,it has become an important tool for analyzing big data,especially in the field of online shopping and community discovery.Both reinforcement learning and unsupervised learning work without labels.In unsupervised learning,we usually need to adjust the sample structure.This adjustment is usually an iterative process rather than one-step process.In this case,the adjustment of data is a sequential decision-making process.Reinforcement learning is an effective tool for the sequential decision-making process.Thus it is reasonable to improve existing reinforcement learning methods and use reinforcement learning to improve unsupervised learning.Based on the above analysis,the main contents and contributions of this dissertation are as follows.1.For continuous control,most existing reinforcement learning methods consider only the one-step transition in the environment.However,in continuous control,the recognizable information is usually hidden in the sequence of the transitions,thus these methods are not effective enough.To solve the above issue,the convolutional deterministic policy is proposed to consider the states in the sequence of transitions.Enlightened from the convolutional neural networks used in natural language processing,the convolutional deterministic policy uses convolutional neural networks to learn the recognizable information from the state in the sequence of transitions.Then the convolutional deterministic policy is updated by not only the recognizable information from the state in one-step transition but also the recognizable information in its previous states.As a result,the convolutional deterministic policy can make the agent take better actions.2.The multi-step actor-critic framework is proposed by combining convolutional deterministic policy and n-step temporal difference learning.In this framework,by the rewards in every n-step,n-step temporal difference learning can accurately estimate the cumulative reward.By the recognizable information from the states in the sequence of transitions,the convolutional deterministic policy can effectively select the action to maximize the estimated cumulative reward.Thus continuous control can be well performed by the multi-step actor-critic framework.The theoretical analysis and the experiment illustrate that with the help of convolutional deterministic policy or multi-step actor-critic framework,the existing reinforcement learning methods can better perform continuous control.3.To well cluster non-spherical data with high noise levels.A new similarity measure,called kernelized rank-order distance,is proposed by incorporating rank-order distance with Gaussian kernel.By reducing the noise in the neighboring information of samples,kernelized rank-order distance improves the existing rank-order distance to tolerate high noise,thus the structures of non-spherical data with high noise levels can be well captured.Then,kernelized rank-order distance strengthens these captured structures by Gaussian kernel so that the samples in the same cluster are closer to each other and can be easily clustered correctly.The experiment illustrates that kernelized rank-order distance can effectively improve existing methods for discovering non-spherical clusters with high noise levels.Furthermore,this similarity also can make reinforcement learning methods get better performance in unsupervised learning.4.A new denoising method,denoising by a new noise index and reinforcement learning,is proposed via combining reinforcement learning with unsupervised denoising.Firstly,it detects each noisy sample by its density and the distance between this sample and the center of its neighbors.A noisy sample usually has a low density and most neighbors of this sample will be on the same side of it,leading to the center of its neighbors far from it,especially for high-noise data.Secondly,for each detected noisy sample,this denoising method models its movement as a Markov decision process to store the experience in this movement.Finally,this denoising method learns a policy to iteratively move each detected noisy sample to its near high density region by learning the experience of this movement in reinforcement learning.The learned experience can effectively help the movement adapt to the high-noise in real-world cases.The experiment illustrates that this denoising method can better denoise high-noise data than existing denoising methods.Overall,all of the above works make contributions to reinforcement learning and its application in unsupervised learning.Convolutional deterministic policy and multi-step actor-critic framework effectively improve the existing reinforcement learning methods for continuous control.Kernelized rank-order distance improves the existing similarity measures for clustering.Denoising by a new noise index and reinforcement learning combines reinforcement learning with unsupervised denoising. |