Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior mismatch between PyTorch GroupNorm and ggml_group_norm #803

Open
balisujohn opened this issue Apr 21, 2024 · 5 comments
Open

Behavior mismatch between PyTorch GroupNorm and ggml_group_norm #803

balisujohn opened this issue Apr 21, 2024 · 5 comments

Comments

@balisujohn
Copy link
Contributor

balisujohn commented Apr 21, 2024

Here are my two tester files which include what should be the same operation on the same tensor with PyTorch and ggml. The same tensor is used, and it's saved as a literal in the tester file in each case. It has dimension 43 X 1024. This is using the cuda backend.

Pytorch example:

print("after norm")
test_norm = torch.nn.GroupNorm(32,1024, eps=1e-06, affine=False).to('cuda')
print(test_norm(tensor))

https://github.com/balisujohn/ggml_pytorch_groupnorm_mismatch/blob/master/tester2.py

GGML example:

struct ggml_tensor * result = ggml_group_norm(ctx0, model.a, 32);

https://github.com/balisujohn/ggml_pytorch_groupnorm_mismatch/blob/master/examples/simple/simple-backend.cpp

Pytorch Output:

after norm
tensor([[[ 1.3370,  1.5613, -0.2031,  ...,  1.9385,  1.7235,  0.7601],
         [ 0.6938, -1.0381,  1.1787,  ...,  1.5918,  1.9686,  1.5100],
         [-0.9839, -0.0334,  0.4642,  ...,  1.8995,  2.0359,  0.2274],
         ...,
         [ 0.2085, -0.2089,  0.8808,  ...,  2.0186,  2.2248,  1.3418],
         [-0.0949,  0.0766,  0.6567,  ..., -1.3484, -1.6598, -1.5336],
         [-0.4434,  0.0802,  1.0636,  ..., -1.7527, -1.9895,  0.2185]]])

GGML Output (DATA shows first 3 and last 3 entries of tensor in row major order):

NAME:
node_0
TYPE
0
SHAPE:
43
1024
1
1
DATA:
1.38015
1.6132
-0.220296
-1.69287
-1.90835
0.101279

torch version: 2.1.0
ggml version: e1daebb (most recent as of time of posting)
cuda version: 12.0

@slaren
Copy link
Collaborator

slaren commented Apr 21, 2024

Try this:

    ggml_tensor * a = ggml_reshape_3d(ctx0, model.a, model.a->ne[0], 1, model.a->ne[1]);
    struct ggml_tensor * result = ggml_group_norm(ctx0, a, 32);

ggml_group_norm normalizes over the first two dimensions (no idea why), and you can workaround that by moving the second dimension to the third.

@balisujohn
Copy link
Contributor Author

OMG it seems to work perfectly I am so grateful and your response time was amazing also!

@balisujohn
Copy link
Contributor Author

balisujohn commented Apr 21, 2024

Would you be interested in a PR adding a comment explaining the normalization behavior in ggml.c?

@slaren
Copy link
Collaborator

slaren commented Apr 21, 2024

Sure I think it would be good to document these things, but ultimately that's up to @ggerganov.

@ggerganov
Copy link
Owner

Yes, a comment in ggml.h would be useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants