ActTensor: Activation Functions for TensorFlow

What is it?

ActTensor is a Python package that provides state-of-the-art activation functions which facilitate using them in Deep Learning projects in an easy and fast manner.

Why not using tf.keras.activations?

As you may know, TensorFlow only has a few defined activation functions and most importantly it does not include newly-introduced activation functions. Wrting another one requires time and energy; however, this package has most of the widely-used, and even state-of-the-art activation functions that are ready to use in your models.

Requirements

Install the required dependencies by running the following command:

conda env create -f environment.yml

Where to get it?

The source code is currently hosted on GitHub at: https://github.com/pouyaardehkhani/ActTensor

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
pip install ActTensor-tf

License

MIT

How to use?

import tensorflow as tf
import numpy as np
from ActTensor_tf import ReLU # name of the layer

functional api

inputs = tf.keras.layers.Input(shape=(28,28))
x = tf.keras.layers.Flatten()(inputs)
x = tf.keras.layers.Dense(128)(x)
# wanted class name
x = ReLU()(x)
output = tf.keras.layers.Dense(10,activation='softmax')(x)

model = tf.keras.models.Model(inputs = inputs,outputs=output)

sequential api

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128),
                                    # wanted class name
                                    ReLU(),
                                    tf.keras.layers.Dense(10, activation = tf.nn.softmax)])

NOTE:

The main functions of the activation layers are also available, but they may be defined by different names. Check this for more information.

from ActTensor_tf import relu

Activations

Classes and Functions are available in ActTensor_tf

Activation Name	Class Name	Function Name
SoftShrink	SoftShrink	softSHRINK
HardShrink	HardShrink	hard_shrink
GLU	GLU	-
Bilinear	Bilinear	-
ReGLU	ReGLU	-
GeGLU	GeGLU	-
SwiGLU	SwiGLU	-
SeGLU	SeGLU	-
ReLU	ReLU	relu
Identity	Identity	identity
Step	Step	step
Sigmoid	Sigmoid	sigmoid
HardSigmoid	HardSigmoid	hard_sigmoid
LogSigmoid	LogSigmoid	log_sigmoid
SiLU	SiLU	silu
PLinear	ParametricLinear	parametric_linear
Piecewise-Linear	PiecewiseLinear	piecewise_linear
Complementary Log-Log	CLL	cll
Bipolar	Bipolar	bipolar
Bipolar-Sigmoid	BipolarSigmoid	bipolar_sigmoid
Tanh	Tanh	tanh
TanhShrink	TanhShrink	tanhshrink
LeCun's Tanh	LeCunTanh	leCun_tanh
HardTanh	HardTanh	hard_tanh
TanhExp	TanhExp	tanh_exp
Absolute	ABS	Abs
Squared-ReLU	SquaredReLU	squared_relu
P-ReLU	ParametricReLU	Parametric_ReLU
R-ReLU	RandomizedReLU	Randomized_ReLU
LeakyReLU	LeakyReLU	leaky_ReLU
ReLU6	ReLU6	relu6
Mod-ReLU	ModReLU	Mod_ReLU
Cosine-ReLU	CosReLU	Cos_ReLU
Sin-ReLU	SinReLU	Sin_ReLU
Probit	Probit	probit
Cos	Cos	Cosine
Gaussian	Gaussian	gaussian
Multiquadratic	Multiquadratic	Multi_quadratic
Inverse-Multiquadratic	InvMultiquadratic	Inv_Multi_quadratic
SoftPlus	SoftPlus	softPlus
Mish	Mish	mish
SMish	Smish	smish
P-SMish	ParametricSmish	Parametric_Smish
Swish	Swish	swish
ESwish	ESwish	eswish
HardSwish	HardSwish	hardSwish
GCU	GCU	gcu
CoLU	CoLU	colu
PELU	PELU	pelu
SELU	SELU	selu
CELU	CELU	celu
ArcTan	ArcTan	arcTan
Shifted-SoftPlus	ShiftedSoftPlus	Shifted_SoftPlus
Softmax	Softmax	softmax
Logit	Logit	logit
GELU	GELU	gelu
Softsign	Softsign	softsign
ELiSH	ELiSH	elish
HardELiSH	HardELiSH	hardELiSH
Serf	Serf	serf
ELU	ELU	elu
Phish	Phish	phish
QReLU	QReLU	qrelu
m-QReLU	MQReLU	mqrelu
FReLU	FReLU	frelu

Activation Name	Use Case	Pros	Cons	Example Usage in Known Network
SoftShrink	Denoising autoencoders	Good for noise reduction	Limited usage scenarios	Used in image denoising autoencoders
HardShrink	Denoising autoencoders	Effective noise removal	Limited usage scenarios	Used in image denoising autoencoders
GLU	Gated networks	Helps with learning complex functions	Requires additional gating mechanism	Gated Linear Units in NLP models like ELMo
Bilinear	Bilinear interpolation	Efficient image processing	Not used for non-image data	Bilinear interpolation in super-resolution networks
ReGLU	Transformer models	Enhanced gating mechanism	Computationally expensive	Enhanced transformer models
GeGLU	Transformer models	Enhanced gating mechanism	Computationally expensive	Enhanced transformer models
SwiGLU	Transformer models	Enhanced gating mechanism	Computationally expensive	Enhanced transformer models
SeGLU	Transformer models	Enhanced gating mechanism	Computationally expensive	Enhanced transformer models
ReLU	General purpose	Simple, efficient, avoids vanishing gradients	Dying ReLU problem	Used in almost all CNN architectures like VGG, ResNet
Identity	Linear networks	Retains input values	No non-linearity	Identity mapping in residual networks
Step	Binary classification	Simple thresholding	Non-differentiable	Used in simple binary classifiers
Sigmoid	Binary classification, output layers	Smooth gradient, probabilistic interpretation	Vanishing gradient problem	Output layer in binary classification networks
HardSigmoid	Low-power devices	Simple and efficient	Non-differentiable	Mobile networks for power efficiency
LogSigmoid	Binary classification, probabilistic outputs	Stabilizes training	Vanishing gradient problem	Binary classification in networks
SiLU	Advanced networks	Combines ReLU and Sigmoid benefits	Computationally expensive	Used in Swish-activated networks
PLinear	Customizable linear transformation	Flexibility	Requires parameter tuning	Custom layers in experimental networks
Piecewise-Linear	Customizable piecewise transformations	Flexibility	Requires parameter tuning	Custom layers in experimental networks
Complementary Log-Log	Probabilistic outputs	Useful for binary classification	Limited use in deep networks	Output layers in certain probabilistic models
Bipolar	Binary classification	Simple bipolar output	Non-differentiable	Binary classification networks
Bipolar-Sigmoid	Binary classification	Combines benefits of Sigmoid and Bipolar	Vanishing gradient problem	Binary classification networks
Tanh	Hidden layers	Zero-centered output, smooth gradient	Vanishing gradient problem	RNNs and LSTMs like in original LSTM paper
TanhShrink	Denoising autoencoders	Combines Tanh with shrinkage	Limited usage scenarios	Used in denoising autoencoders
LeCun's Tanh	Hidden layers	Scaled Tanh for better performance	Vanishing gradient problem	Applied in LeNet-5 network
HardTanh	Low-power devices	Simple and efficient	Non-differentiable	Efficient models for mobile devices
TanhExp	Advanced networks	Combines Tanh and exponential benefits	Computationally expensive	Experimental deep networks
Absolute	Simple tasks	Easy to implement	Non-differentiable	Simple experimental networks
Squared-ReLU	Advanced networks	Combines ReLU and squaring benefits	Computationally expensive	Experimental networks with custom activations
P-ReLU	Customizable ReLU variant	Learnable parameters	Requires parameter tuning	Variants of ResNet
R-ReLU	Regularization	Reduces overfitting	Computationally expensive	Applied in CNNs for added regularization
LeakyReLU	General purpose	Prevents dying ReLU problem	Slightly more computationally expensive than ReLU	LeakyReLU in networks like YOLO
ReLU6	Mobile networks	Bounded output	Dying ReLU problem	EfficientNet and MobileNet
Mod-ReLU	Advanced networks	Combines ReLU and modulation	Computationally expensive	Custom experimental networks
Cosine-ReLU	Advanced networks	Combines ReLU and cosine benefits	Computationally expensive	Custom experimental networks
Sin-ReLU	Advanced networks	Combines ReLU and sine benefits	Computationally expensive	Custom experimental networks
Probit	Probabilistic outputs	Useful for binary classification	Limited use in deep networks	Certain probabilistic models
Cos	Periodic tasks	Handles periodicity well	Non-differentiable	Networks dealing with periodic signals
Gaussian	Radial basis functions	Smooth gradient, radial basis function	Computationally expensive	Radial basis function networks
Multiquadratic	Radial basis functions	Smooth gradient, radial basis function	Computationally expensive	Radial basis function networks
Inverse-Multiquadratic	Radial basis functions	Smooth gradient, radial basis function	Computationally expensive	Radial basis function networks
SoftPlus	Advanced networks	Smooth approximation to ReLU	Computationally expensive	Experimental networks
Mish	Advanced networks	Smooth gradient, non-monotonic	Computationally expensive	Experimental networks
SMish	Advanced networks	Smooth gradient, non-monotonic	Computationally expensive	Experimental networks
P-SMish	Customizable Mish variant	Learnable parameters	Requires parameter tuning	Experimental networks
Swish	Advanced networks	Smooth gradient, non-monotonic	Computationally expensive	EfficientNet
ESwish	Advanced networks	Smooth gradient, non-monotonic	Computationally expensive	Experimental networks
HardSwish	Low-power devices	Simple and efficient	Non-differentiable	MobileNetV3
GCU	Advanced networks	Gradient-controlled units	Computationally expensive	Experimental networks
CoLU	Advanced networks	Combines linear and unit step benefits	Computationally expensive	Experimental networks
PELU	Customizable ELU variant	Learnable parameters	Requires parameter tuning	Custom experimental networks
SELU	Self-normalizing networks	Maintains mean and variance	Requires careful initialization and architecture choices	Self-normalizing networks like in self-normalizing neural networks paper
CELU	Advanced networks	Continuously differentiable ELU	Computationally expensive	Experimental networks
ArcTan	Periodic tasks	Handles periodicity well	Non-differentiable	Networks dealing with periodic signals
Shifted-SoftPlus	Advanced networks	Smooth gradient	Computationally expensive	Experimental networks
Softmax	Output layer for multi-class classification	Converts logits to probabilities	Not suitable for hidden layers	Output layer in classification networks like AlexNet
Logit	Probabilistic outputs	Useful for binary classification	Limited use in deep networks	Certain probabilistic models
GELU	Advanced networks	Combines Gaussian and ReLU benefits	Computationally expensive	Transformer networks like BERT
Softsign	General purpose	Smooth approximation to sign function	Slower convergence	Applied in some RNN architectures
ELiSH	Advanced networks	Combines ELU and Swish benefits	Computationally expensive	Experimental networks
HardELiSH	Low-power devices	Simple and efficient	Non-differentiable	Efficient models for mobile devices
Serf	Advanced networks	Combines several benefits of other functions	Computationally expensive	Experimental networks
ELU	Deep networks	Smooth gradient, avoids dying ReLU problem	Computationally expensive	Deep CNNs like in ELU paper
Phish	Advanced networks	Combines several benefits of other functions	Computationally expensive	Experimental networks
QReLU	Quantized networks	Efficient in low-bit precision	Less flexible than regular ReLU	Efficient quantized networks
MQReLU	Quantized networks	Efficient in low-bit precision	Less flexible than regular ReLU	Efficient quantized networks
FReLU	Advanced networks	Combines ReLU and filter benefits	Computationally expensive	Experimental networks

Which activation functions it supports?

Soft Shrink:

Hard Shrink:

GLU:

Source Paper : Language Modeling with Gated Convolutional Networks

Bilinear:

Source Paper : Parameter Efficient Deep Neural Networks with Bilinear Projections

ReGLU:

ReGLU is an activation function which is a variant of GLU.

Source Paper : GLU Variants Improve Transformer

GeGLU:

GeGLU is an activation function which is a variant of GLU.

Source Paper : GLU Variants Improve Transformer

SwiGLU:

SwiGLU is an activation function which is a variant of GLU.

Source Paper : GLU Variants Improve Transformer

SeGLU:

SeGLU is an activation function which is a variant of GLU.
ReLU:

Source Paper : Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." In Icml. 2010.

Identity:

$f(x) = x$

Step:

Sigmoid:

Source Paper : Han, Jun, and Claudio Moraga. "The influence of the sigmoid function parameters on the speed of backpropagation learning." In International workshop on artificial neural networks, pp. 195-201. Springer, Berlin, Heidelberg, 1995.

Hard Sigmoid:

Source Paper : Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. "Binaryconnect: Training deep neural networks with binary weights during propagations." Advances in neural information processing systems 28 (2015).

Log Sigmoid:

SiLU:

Source Paper : Elfwing, Stefan, Eiji Uchibe, and Kenji Doya. "Sigmoid-weighted linear units for neural network function approximation in reinforcement learning." Neural Networks 107 (2018): 3-11.

ParametricLinear:

$f(x) = a*x$
PiecewiseLinear:

Choose some xmin and xmax, which is our "range". Everything less than than this range will be 0, and everything greater than this range will be 1. Anything else is linearly-interpolated between.
$f(x) = \begin{cases}0 & x < x_{min}\\ mx + b & x_{min} < x < x_{max}\\ 1 & x > x_{xmax} \end{cases}$

Complementary Log-Log (CLL):

Source Paper : Gomes, Gecynalda S. da S., and Teresa B. Ludermir. "Complementary log-log and probit: activation functions implemented in artificial neural networks." In 2008 Eighth International Conference on Hybrid Intelligent Systems, pp. 939-942. IEEE, 2008.

Bipolar:

Bipolar Sigmoid:

Source Paper : Mansor, Mohd Asyraf, and Saratha Sathasivam. "Activation function comparison in neural-symbolic integration." In AIP Conference Proceedings, vol. 1750, no. 1, p. 020013. AIP Publishing LLC, 2016.

Tanh:

Source Paper : Harrington, Peter de B. "Sigmoid transfer functions in backpropagation neural networks." Analytical Chemistry 65, no. 15 (1993): 2167-2168.

Tanh Shrink:

LeCunTanh:

Source Paper : LeCun, Yann A., Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. "Efficient backprop." In Neural networks: Tricks of the trade, pp. 9-48. Springer, Berlin, Heidelberg, 2012.

Hard Tanh:

TanhExp:

Source Paper : Liu, Xinyu, and Xiaoguang Di. "TanhExp: A smooth activation function with high convergence speed for lightweight neural networks." IET Computer Vision 15, no. 2 (2021): 136-150.

ABS:

SquaredReLU:

Source Paper : So, David, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, and Quoc V. Le. "Searching for Efficient Transformers for Language Modeling." Advances in Neural Information Processing Systems 34 (2021): 6010-6022.

ParametricReLU (PReLU):

Source Paper : He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." In Proceedings of the IEEE international conference on computer vision, pp. 1026-1034. 2015.

RandomizedReLU (RReLU):

Source Paper : Xu, Bing, Naiyan Wang, Tianqi Chen, and Mu Li. "Empirical evaluation of rectified activations in convolutional network." arXiv preprint arXiv:1505.00853 (2015).

LeakyReLU:

ReLU6:

Source Paper : Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).

ModReLU:

Source Paper : Arjovsky, Martin, Amar Shah, and Yoshua Bengio. "Unitary evolution recurrent neural networks." In International conference on machine learning, pp. 1120-1128. PMLR, 2016.

CosReLU:

SinReLU:

Probit:

Cosine:

Gaussian:

Multiquadratic:

Choose some point (x,y).

InvMultiquadratic:

SoftPlus:

Source Paper : Dugas, Charles, Yoshua Bengio, François Bélisle, Claude Nadeau, and René Garcia. "Incorporating second-order functional knowledge for better option pricing." Advances in neural information processing systems 13 (2000).

Mish:

Source Paper : Misra, Diganta. "Mish: A self regularized non-monotonic neural activation function." arXiv preprint arXiv:1908.08681 4, no. 2 (2019): 10-48550.

Smish:

ParametricSmish (PSmish):

Swish:

Source Paper : Ramachandran, Prajit, Barret Zoph, and Quoc V. Le. "Searching for activation functions." arXiv preprint arXiv:1710.05941 (2017).

ESwish:

Source Paper : Alcaide, Eric. "E-swish: Adjusting activations to different network depths." arXiv preprint arXiv:1801.07145 (2018).

Hard Swish:

Source Paper : Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1314-1324. 2019.

GCU:

Source Paper : Noel, Mathew Mithra, Advait Trivedi, and Praneet Dutta. "Growing cosine unit: A novel oscillatory activation function that can speedup training and reduce parameters in convolutional neural networks." arXiv preprint arXiv:2108.12943 (2021).

CoLU:

Source Paper : Vagerwal, Advait. "Deeper Learning with CoLU Activation." arXiv preprint arXiv:2112.12078 (2021).

PELU:

Source Paper : Trottier, Ludovic, Philippe Giguere, and Brahim Chaib-Draa. "Parametric exponential linear unit for deep convolutional neural networks." In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 207-214. IEEE, 2017.

SELU:
$f\left(x\right) = \lambda{x} \text{ if } x \geq{0}$$ $$f\left(x\right) = \lambda{\alpha\left(\exp\left(x\right) -1 \right)} \text{ if } x < 0$
where $\alpha \approx 1.6733$ & $\lambda \approx 1.0507$

Source Paper : Klambauer, Günter, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. "Self-normalizing neural networks." Advances in neural information processing systems 30 (2017).

CELU:

Source Paper : Barron, Jonathan T. "Continuously differentiable exponential linear units." arXiv preprint arXiv:1704.07483 (2017).

ArcTan:

ShiftedSoftPlus:

Source Paper : Schütt, Kristof, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. "Schnet: A continuous-filter convolutional neural network for modeling quantum interactions." Advances in neural information processing systems 30 (2017).

Softmax:

Source Paper : Gold, Steven, and Anand Rangarajan. "Softmax to softassign: Neural network algorithms for combinatorial optimization." Journal of Artificial Neural Networks 2, no. 4 (1996): 381-399.

Logit:

GELU:

Softsign:

ELiSH:

Source Paper : Basirat, Mina, and Peter M. Roth. "The quest for the golden activation function." arXiv preprint arXiv:1808.00783 (2018).

Hard ELiSH:

Source Paper : Basirat, Mina, and Peter M. Roth. "The quest for the golden activation function." arXiv preprint arXiv:1808.00783 (2018).

Serf:

Source Paper : Nag, Sayan, and Mayukh Bhattacharyya. "SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function." arXiv preprint arXiv:2108.09598 (2021).

ELU:

Source Paper : Clevert, Djork-Arné, Thomas Unterthiner, and Sepp Hochreiter. "Fast and accurate deep network learning by exponential linear units (elus)." arXiv preprint arXiv:1511.07289 (2015).

Phish:

Source Paper : Naveen, Philip. "Phish: A novel hyper-optimizable activation function." (2022).

QReLU:

Source Paper : Luca Parisi, Daniel Neagu, Renfei Ma, and Felician Campean. "QReLU and m-QReLU: Two novel quantum activation functions to aid medical diagnostics." Expert Systems with Applications (2022).

modified QReLU (m-QReLU):

Source Paper : Luca Parisi, Daniel Neagu, Renfei Ma, and Felician Campean. "QReLU and m-QReLU: Two novel quantum activation functions to aid medical diagnostics." Expert Systems with Applications (2022).

FReLU:

Source Paper : Qiu, Suo, Xiangmin Xu, and Bolun Cai. "FReLU: flexible rectified linear units for improving convolutional neural networks." In 2018 24th international conference on pattern recognition (icpr), pp. 1223-1228. IEEE, 2018.

Cite this repository

@software{Pouya_ActTensor_2022,
author = {Pouya, Ardehkhani and Pegah, Ardehkhani},
license = {MIT},
month = {7},
title = {{ActTensor}},
url = {https://github.com/pouyaardehkhani/ActTensor},
version = {1.0.0},
year = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
ActTensor_tf		ActTensor_tf
act_tensor		act_tensor
help		help
images		images
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
app.py		app.py
environment.yml		environment.yml
markdown.md		markdown.md
setup.py		setup.py

License

pouyaardehkhani/ActTensor

Folders and files

Latest commit

History

Repository files navigation

ActTensor: Activation Functions for TensorFlow

What is it?

Why not using tf.keras.activations?

Requirements

Where to get it?

License

How to use?

Activations

Which activation functions it supports?

Cite this repository

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages