| Deep neural networks have shown great success in many artificial intelligence tasks,including image recognition,object detection,semantic segmentation,speech recognition,and natural language processing.The key factors behind the success lie in the model design and training algorithm of deep neural networks,which have received great attention from academia and industry.However,existing deep networks still have the following underlying limitations.First,deep neural networks often contain a lot of computational operations and complex connections.However,there may exist a large number of redundant modules or operations,which not only increase the computational cost but also hamper the performance.Second,the architectures of deep networks are often very complex and the search space becomes extremely large.As a result,it is difficult to explore the entire space to find effective architectures.Third,deep neural networks often come with extremely high computational cost.Consequently,it is hard to meet the requirements of real-world application scenarios given limited computation resources.Finally,the training of the deep model highly depends on the quality of training data.When data are divided into very small batches or poorly sampled,they will seriously affect the stability of training procedure of deep networks.To address the above challenges,we make the following contributions in this paper:1)To reduce the computational redundancy of deep networks,a neural architecture optimization method is proposed to detect and optimize the redundant modules/operations of any given architecture.Specifically,a neural architecture transformer model is presented to take any architecture as input and replace the redundant operations with the more efficient counterparts,such as skip connection or null connection.In practice,the neural architecture transformer model is implemented with a graph convolutional network to extract the complex connection relationships inside architectures.The proposed method is able to greatly improve the performance of both hand-crafted and automatically searched architectures.2)An efficient neural architecture search algorithm is proposed to explore the large search space.Specifically,the proposed method exploits a curriculum learning scheme to search for promising architectures in a progressive manner.It is worth noting that,increasing the number of nodes would make the size of search space grow much faster than increasing the number of operations,leading to a non-negligible gap between adjacent stages.Therefore,the proposed method gradually enlarges the search space by increasing the number of operations.Based on this,a curriculum search strategy is further proposed to construct a series of progressively growing search spaces and perform architecture search on them.Extensive experiments demonstrate the superiority of the proposed method over existing methods.3)To design effective architectures that satisfy various budgets of computational resources,a neural architecture generator is proposed to automatically take an arbitrary budget as input and produce the Pareto optimal architecture for the target budget.To this end,the proposed method seeks to learn a Pareto frontier(i.e.,the set of Pareto optimal architectures)over model performance and computational cost.In this sense,it only needs to perform search on the learned frontier to find promising architectures during inference.In order to learn this Pareto frontier,a neural architecture evaluator is further proposed by learning a Pareto dominance rule to determine whether an architecture is better than another.Extensive experiments show that the architectures produced by the proposed method consistently outperform the architectures searched by existing methods under different budgets.4)Regarding the problem of training instability incurred by the sensitivity of deep networks to training data,a stable data normalization method and an effective training algorithm are proposed.Specifically,a memorized batch normalization method is presented to take multiple recent data batches as memory and compute statistics based on them.Relying on the proposed normalization method,a double forward propagation training algorithm is further proposed.Specifically,an additional forward propagation is performed in each iteration to keep the statistics of each normalization layer up-to-date.Extensive experiments show that our method greatly improves the generalization performance of deep networks. |