Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply_affine_to_boxes does not handle flipping properly, due to the left-closed and right-open nature of the boxes #7740

Open
function2-llx opened this issue May 4, 2024 · 5 comments

Comments

@function2-llx
Copy link
Contributor

function2-llx commented May 4, 2024

Describe the bug
apply_affine_to_boxes does not handle flipping properly, due to the left-closed and right-open nature of the boxes. That is to say, when a box is flipped, the left-closed and right-open coordinates become left-open and right closed, which should be maintained.

To be clarified, I understand that there's a flip_boxes function available, but I want to use this example for simplicity. Same issue exists for more complicated affine matrix, as long as containing flipping.

To Reproduce

import torch

from monai.apps.detection.transforms.box_ops import apply_affine_to_boxes
from monai.data import MetaTensor
import monai.transforms as mt

def main():
    for flip_axis in range(3):
        flip = mt.Flip(spatial_axis=flip_axis)
        x = torch.zeros(1, 1, 1, 1)
        flipped: MetaTensor = flip(x)
        box = torch.tensor([[0, 0, 0, 1, 1, 1]])
        box_flipped = apply_affine_to_boxes(box, flipped.affine.inverse())
        print(box_flipped)

if __name__ == '__main__':
    main()

Expected behavior
Produce the same results as flip_boxes.

Actual Results

tensor([[-1,  0,  0,  0,  1,  1]])
tensor([[ 0, -1,  0,  1,  0,  1]])
tensor([[ 0,  0, -1,  1,  1,  0]])

Suggestion
My suggestion for the fix is to convert the box coordinates to be closed on the both sides before applying the affine matrix, and convert them back to the desired format after applying the affine.

Additional Context

If I understand correctly, the left-closed and right open nature is suggested here:

# TO_REMOVE = 0.0 if the bottom-right corner pixel/voxel is not included in the boxes,
# i.e., when xmin=1., xmax=2., we have w = 1.
# TO_REMOVE = 1.0 if the bottom-right corner pixel/voxel is included in the boxes,
# i.e., when xmin=1., xmax=2., we have w = 2.
# Currently, only `TO_REMOVE = 0.0` is supported
TO_REMOVE = 0.0 # xmax-xmin = w -TO_REMOVE.

Feature Request

This is not about the major topic of this issue, but by the way, I would like to mention that in the code above I calculate the inverse of the affine matrix. Currently, I only find this way works to apply arbitrary affine transform to boxes. It would be nice if apply_affine_to_boxes could do it for be internally by solving a linear equation for numerical stability.

@KumoLiu
Copy link
Contributor

KumoLiu commented May 6, 2024

Hi @function2-llx, I'm not exactly sure what outcome you're expecting. It appears the result is correct as the six digits represent coordinates in the XYZXYZ format.

Thanks.

@function2-llx
Copy link
Contributor Author

@KumoLiu Hello, the format is indeed correct, however, the values of coordinates are mismatched with flip_boxes. Let me do it again with this function:

import torch

from monai.apps.detection.transforms.box_ops import apply_affine_to_boxes, flip_boxes
from monai.data import MetaTensor
import monai.transforms as mt

def main():
    for flip_axis in range(3):
        flip = mt.Flip(spatial_axis=flip_axis)
        x = torch.zeros(1, 1, 1, 1)
        flipped: MetaTensor = flip(x)
        box = torch.tensor([[0, 0, 0, 1, 1, 1]])
        box_flipped = apply_affine_to_boxes(box, flipped.affine.inverse())
        print('wrong:', box_flipped)
        box_flipped_correct = flip_boxes(box, x.shape[1:], flip_axis)
        print('correct:', box_flipped_correct)

if __name__ == '__main__':
    main()

The results are:

wrong: tensor([[-1,  0,  0,  0,  1,  1]])
correct: tensor([[0, 0, 0, 1, 1, 1]])
wrong: tensor([[ 0, -1,  0,  1,  0,  1]])
correct: tensor([[0, 0, 0, 1, 1, 1]])
wrong: tensor([[ 0,  0, -1,  1,  1,  0]])
correct: tensor([[0, 0, 0, 1, 1, 1]])

@KumoLiu
Copy link
Contributor

KumoLiu commented May 6, 2024

Hi @function2-llx, the difference you're seeing arises from the second argument you're passing into the flip_boxes function, which is the spatial_size of the image. The spatial_size parameter dictates the axis along which you intend to flip. For more details, please take a look at the code here. And if you set it to [0, 0, 0], you will get the same result with the apply_affine_to_boxes.

This comment in the discussion about the issue might also provide some helpful context.

Thanks.

@function2-llx
Copy link
Contributor Author

function2-llx commented May 6, 2024

The spatial_size parameter dictates the axis along which you intend to flip.

I'm sorry if I misunderstand anything, but isn't the spatial_size parameter literally indicating the size of the image that the boxes are on?

def flip_boxes(
boxes: NdarrayTensor, spatial_size: Sequence[int] | int, flip_axes: Sequence[int] | int | None = None
) -> NdarrayTensor:
"""
Flip boxes when the corresponding image is flipped
Args:
boxes: bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be ``StandardMode``
spatial_size: image spatial size.

@KumoLiu
Copy link
Contributor

KumoLiu commented May 6, 2024

Hi @function2-llx,

but isn't the spatial_size parameter literally indicating the size of the image that the boxes are on?

Yes, for this flip_boxes and for detection. However, boxes can only be boxes; geometric data can be without reference. So the flip_boxes here is only used to detection application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants