Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement More Activation Functions #107

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

FrostyTheSouthernSnowman
Copy link

@FrostyTheSouthernSnowman FrostyTheSouthernSnowman commented May 16, 2024

The README mentioned that more activation functions are on the road map. I have gotten started. As of publishing, I only added Threshold, but I plan to go down the list from the (PyTorch Docs)[https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions]. Since I'm just going down the list, I'll do HardTanh next, unless someone has one they want written first. I am willing to write documentation if that is needed. Otherwise, I am trying to keep everything as close to the PyTorch docs as possible. If anybody thinks I should follow Tensoflow, Jax, or something else's docs instead, I am totally up for it as well. Here's the current TODO list for what I plan to implement first:

  • threshold
  • hardtanh
  • hardswish
  • selu
  • celu
  • leaky_relu
  • gelu
  • softsign
  • softplus
  • softmin
  • hardsigmoid
  • mish
  • normalize

Once I get all of those done, I'll do the rest of the ones listed on the PyTorch docs, assuming my schedule allows for it. I am more than happy to switch out anything on the list.

Also, I've noticed that the current AttributeVector system lead to a lot of type conversions, which I feel could hinder performance. I don't totally plan to look into it right now, although I could if it were necessary.

There also seem to be a lot of overloads in the test file. Once I implement a few more activation functions, I'd like to see if I can split it into just a few basic categories that can be widely used across all the different activation functions.

@andresnowak
Copy link
Collaborator

andresnowak commented May 19, 2024

For the activation functions the tests for the backward part and also forward (apart from test_activations) would be in tests/mojo/test_mlops and tests/python/test_mlops_torch. And I wouldn't say there are activation function to prioritize, but i would say the only necessary for now would be leaky_relu, gelu and selu (from the ones mentioned i think those are the most used), but I think (not sure) the non approximation (the non tanh) version of gelu is complicated to implement. Because maybe we have to think about how to divide some parts of the code base, so some cleanups will be necessary. So I think with only those function would be enough I think.

@FrostyTheSouthernSnowman
Copy link
Author

Ok. Will work on leaky_relu and selu, and I'll do gelu last.

@FrostyTheSouthernSnowman
Copy link
Author

Also, I was looking through the code and I realized that there are tests in test_mlops.mojo for the activation functions that I forgot to write. Will write those for threshold, hard_tanh, and leaky_relu, as well as the tests in test_activations

@andresnowak
Copy link
Collaborator

andresnowak commented May 22, 2024

There are also tests in tests/python/test_mlops_torch.mojo. Those are the three places test_activations, test_mlops and test_mlops_torch

@FrostyTheSouthernSnowman
Copy link
Author

The torch compatibility tests for threshold and hard tanh show there to be some bugs. Are they important enough to be worth debugging or should I just continue to gelu and selu and just delete all the threshold and hardtanh code?

@andresnowak
Copy link
Collaborator

If there are errors when comparing with the torch version yeah they should be fixed. But if you want yeah you can delete them and work with gelu, selu or you can fix the hard thanh and threshold.

@FrostyTheSouthernSnowman
Copy link
Author

Ok. Haven't ever heard of them being used anyways. I'll just get rid of that code and work on the gelu and selu

@FrostyTheSouthernSnowman
Copy link
Author

Is there an OP for elementwise min or max? Like what is denoted mathematically by say min(0, x)? Would be useful in a few places. And could simplify some implementations.

@andresnowak
Copy link
Collaborator

andresnowak commented Jun 7, 2024

yeah in autograd/ops/basics there is the max op, or in utils/tensorutils there is the reduce op (only reduce all or over one dimension)

@FrostyTheSouthernSnowman
Copy link
Author

Doesn't the current max op get the max value in the tensor? I need something that gets the max or min between two values.

@andresnowak
Copy link
Collaborator

andresnowak commented Jun 7, 2024

that is part of mojo, there is already a max op in mojo in the math module (that gets the max value or min the min value between two simd values)

@FrostyTheSouthernSnowman
Copy link
Author

Thanks! I'll use that then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants