Behavior mismatch between PyTorch `GroupNorm` and `ggml_group_norm` #803

balisujohn · 2024-04-21T22:52:05Z

Here are my two tester files which include what should be the same operation on the same tensor with PyTorch and ggml. The same tensor is used, and it's saved as a literal in the tester file in each case. It has dimension 43 X 1024. This is using the cuda backend.

Pytorch example:

print("after norm")
test_norm = torch.nn.GroupNorm(32,1024, eps=1e-06, affine=False).to('cuda')
print(test_norm(tensor))

https://github.com/balisujohn/ggml_pytorch_groupnorm_mismatch/blob/master/tester2.py

GGML example:

struct ggml_tensor * result = ggml_group_norm(ctx0, model.a, 32);

https://github.com/balisujohn/ggml_pytorch_groupnorm_mismatch/blob/master/examples/simple/simple-backend.cpp

Pytorch Output:

after norm
tensor([[[ 1.3370,  1.5613, -0.2031,  ...,  1.9385,  1.7235,  0.7601],
         [ 0.6938, -1.0381,  1.1787,  ...,  1.5918,  1.9686,  1.5100],
         [-0.9839, -0.0334,  0.4642,  ...,  1.8995,  2.0359,  0.2274],
         ...,
         [ 0.2085, -0.2089,  0.8808,  ...,  2.0186,  2.2248,  1.3418],
         [-0.0949,  0.0766,  0.6567,  ..., -1.3484, -1.6598, -1.5336],
         [-0.4434,  0.0802,  1.0636,  ..., -1.7527, -1.9895,  0.2185]]])

GGML Output (DATA shows first 3 and last 3 entries of tensor in row major order):

NAME:
node_0
TYPE
0
SHAPE:
43
1024
1
1
DATA:
1.38015
1.6132
-0.220296
-1.69287
-1.90835
0.101279

torch version: 2.1.0
ggml version: e1daebb (most recent as of time of posting)
cuda version: 12.0

The text was updated successfully, but these errors were encountered:

slaren · 2024-04-21T23:05:16Z

Try this:

    ggml_tensor * a = ggml_reshape_3d(ctx0, model.a, model.a->ne[0], 1, model.a->ne[1]);
    struct ggml_tensor * result = ggml_group_norm(ctx0, a, 32);

ggml_group_norm normalizes over the first two dimensions (no idea why), and you can workaround that by moving the second dimension to the third.

balisujohn · 2024-04-21T23:09:05Z

OMG it seems to work perfectly I am so grateful and your response time was amazing also!

balisujohn · 2024-04-21T23:12:45Z

Would you be interested in a PR adding a comment explaining the normalization behavior in ggml.c?

slaren · 2024-04-21T23:16:26Z

Sure I think it would be good to document these things, but ultimately that's up to @ggerganov.

ggerganov · 2024-04-25T14:42:37Z

Yes, a comment in ggml.h would be useful

balisujohn mentioned this issue May 4, 2024

added comment explaining group norm behavior in ggml.h #813

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Behavior mismatch between PyTorch `GroupNorm` and `ggml_group_norm` #803

Behavior mismatch between PyTorch `GroupNorm` and `ggml_group_norm` #803

balisujohn commented Apr 21, 2024 •

edited

slaren commented Apr 21, 2024

balisujohn commented Apr 21, 2024

balisujohn commented Apr 21, 2024 •

edited

slaren commented Apr 21, 2024

ggerganov commented Apr 25, 2024

Behavior mismatch between PyTorch GroupNorm and ggml_group_norm #803

Behavior mismatch between PyTorch GroupNorm and ggml_group_norm #803

Comments

balisujohn commented Apr 21, 2024 • edited

slaren commented Apr 21, 2024

balisujohn commented Apr 21, 2024

balisujohn commented Apr 21, 2024 • edited

slaren commented Apr 21, 2024

ggerganov commented Apr 25, 2024

Behavior mismatch between PyTorch `GroupNorm` and `ggml_group_norm` #803

Behavior mismatch between PyTorch `GroupNorm` and `ggml_group_norm` #803

balisujohn commented Apr 21, 2024 •

edited

balisujohn commented Apr 21, 2024 •

edited