Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre and post processing steps to allow non float dtypes #2882

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ashnair1
Copy link
Contributor

Changes

When using cross entropy loss with torchmetrics, masks needs to be of type int but kornia only considers float as valid data type.

Type of change

  • 馃敩 New feature (non-breaking change which adds functionality)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Did you update CHANGELOG in case of a major change?

Copy link
Member

@johnnv1 johnnv1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some tests to ensure too... since the random generators use float values, maybe the output will be float anyway... I'm not sure, but that's what I'm waiting for xD

Copy link
Member

@shijianjian shijianjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is reasonable, as most methods considers only 0-1 value ranges. If you want to support other formats like uint8 or uint16. A preprocessing and postprocessing to 0-1 are in need I think.

@ashnair1
Copy link
Contributor Author

I do not think it is reasonable, as most methods considers only 0-1 value ranges. If you want to support other formats like uint8 or uint16. A preprocessing and postprocessing to 0-1 are in need I think.

Agreed. Have made it so inputs are converted to float before the operations and reverted to original dtype once they're complete.

@ashnair1 ashnair1 marked this pull request as ready for review April 17, 2024 16:50
@ashnair1 ashnair1 changed the title Add int32 & int64 as valid data types Add pre and post processing steps to allow non float dtypes Apr 17, 2024
@ashnair1
Copy link
Contributor Author

ashnair1 commented May 1, 2024

@shijianjian @johnnv1 @edgarriba Any thoughts on this?

Copy link
Member

@shijianjian shijianjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not apply on input type.

def test_forward_and_inverse(self, random_apply, device, dtype):
inp = torch.randn(1, 3, 1000, 500, device=device, dtype=dtype)
if dtype not in [torch.float32, torch.float64]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not run int test in our CI. You may put only mask for int.

dtype = None
if not torch.is_floating_point(input):
dtype = input.dtype
input = input.float()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make sense for input. Our function does not support 0-255 range.

@ashnair1
Copy link
Contributor Author

ashnair1 commented May 6, 2024

Current issue is the type of dtype to cast to. If a non float dtype mask is detected, it is converted into a float type and later reverted back. The issue is that if it's converted to float, the slow float64 tests complain (Expected float, got double) and when it's converted to double, the slow float32 tests complain. Does it need to be aware of what dtype input is and map accordingly?

Would appreciate some pointers here as I'm not sure how to go about this.

@johnnv1
Copy link
Member

johnnv1 commented May 18, 2024

Would appreciate some pointers here as I'm not sure how to go about this.

hey @ashnair1, sorry for the delay on this one...

i think the issue here is: the random generator creates the augmentations parameters in XXX device and YYY dtype then casts it into the input dtype (probably the same dtype as the image, since we always have the image)... here you are "hardcoding" the dtype for the other cases when you want to support nonfloating dtypes, instead you should cast it to the input dtype (or maybe the params dtype? not sure cc @shijianjian)

@ashnair1
Copy link
Contributor Author

That does not make sense to me. Just so we're all on the same page regarding what we're trying to achieve here.

In a kornia augmentation, masks by default have to be float. This is because certain operations don't work for int tensors. But as mentioned above, masks are typically expected to be int tensors and the fact they are float here is a design convenience.

So in order to work with masks tensors of type int, we should pre-process it to float before the op (that only accepts floats) and post process it back to int after the op.

However in the tests, it requires the mask tensors to be the same dtype as input. This means that if you want to test passing in a mask tensor of type int, you would (as you suggested) need to have input be of type int as well. Which in turn means we need to add a pre and post processing step for input as well which @shijianjian had advised against (#2882 (review)) and which I agree with.

Ultimately, what I would like to see, is that kornia should ensure input is float while allowing masks to be int or float relying on the processing steps to ensure the mask tensor works with the ops.

@johnnv1
Copy link
Member

johnnv1 commented May 20, 2024

That does not make sense to me. Just so we're all on the same page regarding what we're trying to achieve here.

In a kornia augmentation, masks by default have to be float. This is because certain operations don't work for int tensors. But as mentioned above, masks are typically expected to be int tensors and the fact they are float here is a design convenience.

So in order to work with masks tensors of type int, we should pre-process it to float before the op (that only accepts floats) and post process it back to int after the op.

yeap, but we need to cast to the right floating type, which is the same floating type as the input data, otherwise, it will mismatch and crash in some ops...

what I mean is to retrieve the data type from the input itself, like using something like

inp = in_args[self.transform_op.data_keys.index(DataKey.INPUT)]
(or some other from the API itself)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants