| In recent years,semantic segmentation has been applied to many fields.With the rapid development of deep learning,CNN-based semantic segmentation yields better performance,but the huge amount of network parameters and computations limit the deployment of them on hardware platforms,thus designing light-weight networks becomes a hotspot in the field of semantic segmentation.However,manually designed methods require lots of resources,so researchers hope to automate the designing process of networks.Thus,some works based on Neural Architecture Search(NAS)have been proposed and use Knowledge Distillation(KD)which utilizes a well-performed large network to guide the training of a compact network to improve the performance of the light-weight network.Some works further integrate KD and NAS and use the knowledge provided by KD to guide the search strategy to find the target network with better performance.However,these methods only take the output distribution of the large network as the knowledge,thus ignoring the influence of architecture structure within KD,resulting in the search strategy often converging to a sub-optimal solution.To this end,we propose a novel structural knowledge distillation algorithm,which leverages both the output distribution and the structural knowledge between the teacher network and the student network and utilizes a gradient-based search strategy to obtain the optimal network.To that end,we first model the structural knowledge in KD as the structural similarity between two architectures and then propose a novel stage-wise fully connected and chain-structured search space.We design a multi-objective optimization function for the search strategy:Firstly,the traditional knowledge distillation loss is used to constrain the KL divergence between the output distributions of the teacher network and the student network.Secondly,we propose to use a transformation module to convert the network architectures into graphs and use the Graph Edit Distance(GED)to calculate the structural similarity loss.Thirdly,we add an item of network latency constraint,resulting in the successful application of our method on three types of hardware platforms.We conduct experiments on two datasets and find that the proposed method is superior to the traditional manually designed method and superior to the neural architecture search methods equipped with traditional KD loss,indicating that the network searched by our method not only has more excellent characteristics and thus is more suitable for KD but also meets the needs of customization and the deployment of different types of hardware platforms. |