Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Hermite Polynomials Coefficients #1

Open
zhengyu-yang opened this issue Aug 5, 2020 · 6 comments
Open

Question on Hermite Polynomials Coefficients #1

zhengyu-yang opened this issue Aug 5, 2020 · 6 comments

Comments

@zhengyu-yang
Copy link

Thank you for this interesting paper. However, as I go through the paper and code, there is something that I am not very sure about. Based on my understanding, followings are the normalized Hermite polynomial from degree 0 to 3 (i.e. h_0 to h_3).
$_l_1_2_x_2_sqr

However, after reading your code, I am a little confused. In the following section of your code, the coefficient of the hermite polynomial doesn't match.

def hermite(self, x, k, num_pol=5):
return (
(torch.full_like(x, k[0].item()) * torch.ones_like(x)) +
(torch.full_like(x, k[1].item()) * x) +
(torch.full_like(x, k[2].item()) * (x**2 - 1) / 1.4142135623730951)
+ (torch.full_like(x, k[3].item()) *
(x**3 - 3 * x) / 2.449489742783178))

@zhengyu-yang
Copy link
Author

Also, when you use .item(), the gradient flow is cut off and the weights of hermite polynomials are no longer trainable.

@lokhande-vishnu
Copy link
Owner

Hi @elony314 , Thanks for your questions.

Coefficients of Hermite Polynomial Activations

The n^{th} coefficient of the hermite activation is computed by calculating the inner product between the ReLU function and the n^{th} normalized hermite polynomial. Our objective is to decompose ReLU function using Hermite Polynomial Basis. Please refer to the Section 4 of arXiv:1711.00501 for the definition of the inner product and a nice primer on Hermite functions. The foot-note on Page 5 of arXiv:1711.00501 provides the precise coefficient values for different 'n', which is what we use in the code.

.item(), the gradient flow is cut off

Thanks for pointing this out. In our model, we set the coefficients of hermite polynomials as trainable parameters .

The code that is linked in the question was not used for training but to compute the time and cost of running the script for a "single" epoch, hence, we did not require gradients to be updated.

I have added a new "activations.py" file that was used for training and renamed the existing file as "activations_timing.py" to resolve confusions. This new "activations.py" file was earlier made available in all the other directories such as 1-deep_autoencoder, 2-supervised_setting and 4-WhyHermitesProvideFasterConvergence.

https://github.com/lokhande-vishnu/DeepHermites/blob/master/Code/3-semisupervised_setting/aws_costestimates/epoch_measurements/mnist/4hermites_v2l_1epochs_w/lib/activations.py

@zhengyu-yang
Copy link
Author

zhengyu-yang commented Aug 6, 2020

I am well aware of the weights (c_i) from arXiv:1711.00501, and I use the same value in my code. However, what I am referring to are the coefficients "inside" the hermite polynomials (h_i).

To be more specific, following is what I have derivated for normalized hermite polynomials from degree 0 to 3:
h_0_ =_frac_1_sq

Following is what seems to be in your code:
h_0_ =_frac_1_sq

@lokhande-vishnu
Copy link
Owner

We use probabilists' Hermite polynomials. You have computed physicists' Hermite polynomials.
Please see the section above "properties" in this link https://en.wikipedia.org/wiki/Hermite_polynomials

@zhengyu-yang
Copy link
Author

Thanks a lot. I see. However, I think equation (1) in the paper is the physicists' Hermite polynomial, and I follow that one to derive the above equations.
Screen Shot 2020-08-07 at 03 04 36

@lokhande-vishnu
Copy link
Owner

Thanks for pointing this out. We will modify the equation into this in our next arxiv update.
arxiv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants