Research On Novel Activation Functions In Convolutional Neural Networks

Posted on:2022-05-12

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhu

Full Text:PDF

GTID:2518306539492064

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Faster and better architectures of Convolutional Neural Networks(CNNs)have always been a research hotspot.No matter how architectures of CNNs change,the activation function is necessary.Rectified Linear Unit(Re LU)has been widely-used in most of CNNs.In the past years,a series of proposed monotonic activation functions try to replace Re LU.However,the performance improvements of these monotonic activation functions tend to be inconsistent across different datasets and CNNs.Softmax is also widely-used in CNNs.But it is easy for Softmax to occur computational overflow when large positive numbers are input due to containing exponential operation.Firstly,this paper proposes two novel non-monotonic activation functions called Power Function Linear Unit(PFLU)and faster PFLU(FPFLU)to solve the problem that the performance improvements of most monotonic activation functions tend to be inconsistent across different datasets and CNNs.The negative part of PFLU is nonmonotonic and closer to zero with the negative input decreasing,which can maintain sparsity of the negative part while introducing negative activation values and non-zero derivative values for the negative part.The positive part of PFLU does not use identity mapping but is closer to identity mapping with the positive input increasing,which can bring stronger non-linearity for the positive part.Different from PFLU,FPFLU uses identity mapping in its positive part.Similar with PFLU,FPFLU is also non-monotonic in its negative part.Next,this paper proposes computationally safer HardSoftmax to solve the problem that it is easy for Softmax to occur computational overflow when large positive numbers are input,and further proposes parallel selective kernel(PSK)attention based on HardSoftmax.Different from selective kernel(SK)attention which puts the extraction and transformation of global features after feature fusion,PSK attention puts it in one branch alone,thus being in parallel connection with multiple branches with different kernel sizes.Meanwhile,the transformation of global features uses group convolution to reduce the number of parameters and multiply adds(MAdds).Finally,multiple branches with different kernel sizes are fused using HardSoftmax attention that is guided by the information in these branches.A wide range of image classification experiments show that PFLU tends to work better than current state-of-the-art non-monotonic activation functions,and FPFLU can run faster than most non-monotonic activation functions.Experiments also show that just simply replacing Softmax with HardSoftmax can maintain or improve the performance of original attention.HardSoftmax also runs faster than Softmax in the experiments of this paper.PSK attention can match or outperform SK attention with less parameters and MAdds.Some of the results have been published in SCI and Chinese core journals.

Keywords/Search Tags:

CNNs, activation function, PFLU, FPFLU, HardSoftmax, PSK attention

PDF Full Text Request

Related items

1	The Research Of Cellular Neural Networks With Trapezoidal And Impulsive Activation Function
2	Research Of Vehicle Recognition Based On Octave Convolution And Self-attention Mechanism
3	Activation Function Awareness Of RNN Algorithm Optimization
4	Research On Hardware Implementation Of Activation Function In Neural Network On Many-Core Processor
5	Theoretical Research And Application Of Perceptron With Multi-pulse Type Activation Function
6	Research And Design Of Activation Function Hardware Accelerator Based On FPGA
7	Research On Adaptive Activation Function For Deep Learning
8	Research And Application On Convolutional Neural Network Algorithm Based On Improved Activation Function
9	Deep Spatial Image Steganalysis Based On Selective Channel Perception
10	Orthonormal activation function-based neural networks for adaptive control of nonlinear systems