Font Size: a A A

Research On Novel Activation Functions In Convolutional Neural Networks

Posted on:2022-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhuFull Text:PDF
GTID:2518306539492064Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Faster and better architectures of Convolutional Neural Networks(CNNs)have always been a research hotspot.No matter how architectures of CNNs change,the activation function is necessary.Rectified Linear Unit(Re LU)has been widely-used in most of CNNs.In the past years,a series of proposed monotonic activation functions try to replace Re LU.However,the performance improvements of these monotonic activation functions tend to be inconsistent across different datasets and CNNs.Softmax is also widely-used in CNNs.But it is easy for Softmax to occur computational overflow when large positive numbers are input due to containing exponential operation.Firstly,this paper proposes two novel non-monotonic activation functions called Power Function Linear Unit(PFLU)and faster PFLU(FPFLU)to solve the problem that the performance improvements of most monotonic activation functions tend to be inconsistent across different datasets and CNNs.The negative part of PFLU is nonmonotonic and closer to zero with the negative input decreasing,which can maintain sparsity of the negative part while introducing negative activation values and non-zero derivative values for the negative part.The positive part of PFLU does not use identity mapping but is closer to identity mapping with the positive input increasing,which can bring stronger non-linearity for the positive part.Different from PFLU,FPFLU uses identity mapping in its positive part.Similar with PFLU,FPFLU is also non-monotonic in its negative part.Next,this paper proposes computationally safer HardSoftmax to solve the problem that it is easy for Softmax to occur computational overflow when large positive numbers are input,and further proposes parallel selective kernel(PSK)attention based on HardSoftmax.Different from selective kernel(SK)attention which puts the extraction and transformation of global features after feature fusion,PSK attention puts it in one branch alone,thus being in parallel connection with multiple branches with different kernel sizes.Meanwhile,the transformation of global features uses group convolution to reduce the number of parameters and multiply adds(MAdds).Finally,multiple branches with different kernel sizes are fused using HardSoftmax attention that is guided by the information in these branches.A wide range of image classification experiments show that PFLU tends to work better than current state-of-the-art non-monotonic activation functions,and FPFLU can run faster than most non-monotonic activation functions.Experiments also show that just simply replacing Softmax with HardSoftmax can maintain or improve the performance of original attention.HardSoftmax also runs faster than Softmax in the experiments of this paper.PSK attention can match or outperform SK attention with less parameters and MAdds.Some of the results have been published in SCI and Chinese core journals.
Keywords/Search Tags:CNNs, activation function, PFLU, FPFLU, HardSoftmax, PSK attention
PDF Full Text Request
Related items