Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #8307 -- Saving a model with an ImageField with width_field or height_field no longer results in an extra read operation #17775

Closed
wants to merge 2 commits into from

Conversation

john-parton
Copy link
Contributor

@john-parton john-parton commented Jan 24, 2024

This makes a slight change to how the FieldFileDescriptor works to fix #8307.

The previous pre-save/save/post-save flow worked something like this

  • When you assign a file/image, the descriptor's set method is called
  • Just before you save the model, the storage backend figures out what a reasonable filename would be and returns that back to the descriptor
  • The descriptor sets the attribute to be that string, erasing any reference to the original object that was set

And then in the case of the ImageField with width_field/height_field

  • The descriptor then calls a method to calculate the width/height, but because the value of the attribute is a string and no reference to the old object is kept around, it must work its way through the storage API and re-fetch the object, resulting in disk access or network overhead

What this change does is make it so that when the save() method is called on the fieldfile, that it passes an fieldfile with the data necessary to perform whatever additional logic needs to be handled by the descriptor without doing additional IO. It will usually end up calling Pillow again and doing some extra work there, I believe. It should be possible to work around that as well, but that was outside of the scope of this, I think.

I had considered passing a special object type to the descriptor and not re-constructing the fieldfile. I don't know if this is considered part of the public API? I'll need some guidance on what to document.

I did write some tests, and I verified that tests are passing. There were a few on my machine that failed, but I think they're already failing.

Edit

It looks like I have some linter issues, and my test is not really written the way that other tests are.

I noticed there's some 4x8.png and 8x4.png files. I can move the tests to that same folder and reuse those images. That's probably better than base64 encoding a jpeg.

@john-parton
Copy link
Contributor Author

I've updated the PR. I fixed the linter errors and put the test in the correct module and removed my fixtured, using the existing image fixtures.

@john-parton
Copy link
Contributor Author

I'll resolve the failing test. I believe I understand what's going on.

@john-parton
Copy link
Contributor Author

I made some additional changes which means my change touches fewer internals. It's very narrow in scope and shouldn't effect backwards compatibility, except in one specific case:

Prior to this change, Django had undocumented behavior that the value of width_field and height_field would be auto-populated from the file after it's saved to the storage back end. Most general purpose storage back-ends are unlikely to modify a file after it's saved, however, I know that many image can. (For instance Cloudflare images).

I added documentation showing the current behavior is to populate the values of width_field and height_field from the file immediately before it's saved to the storage back-end. In the case that the back-end modifies the file, you might get a width or height which is different from the actual underlying file.

Considering that width_field and height_field are essentially performance optimizations, I think that this is the sane default behavior.

@john-parton
Copy link
Contributor Author

@felixxm Can I get a review of this? I think it does exactly what it's supposed to do: no more, no less.

…ld or height_field

is used on ImageField.

When the ImageFileDescriptor is set during the normal save workflow, a reference
the file contents is passed to the descriptor using a wrapper class,  which the descriptor
can then use to calculate values without doing an additional read from the storage backend.
@john-parton john-parton changed the title Fix #8307 Fix #8307 -- Saving a model with an ImageField with width_field or height_field no longer results in an extra read operation Jan 29, 2024
@shangxiao
Copy link
Contributor

@felixxm Can I get a review of this? I think it does exactly what it's supposed to do: no more, no less.

No need to tag people explicitly for review (in fact I'd personally discourage it as it adds to people's workload unnecessarily). There are people watching PRs & tickets that will review it when they get a chance 👍

@john-parton
Copy link
Contributor Author

No problem. I was chatting with him on Discord, but I understand the etiquette

@nessita
Copy link
Contributor

nessita commented Jan 30, 2024

No problem. I was chatting with him on Discord, but I understand the etiquette

As a reference, the PRs waiting/needing review are listed in the Django Development Dashboard (look for Patches needing review). As of right now, you will not find your PR listed there because the "needs documentation" flag is set: details about flag management and their semantics can be found in these docs.

Also a way of helping reviewers is to ensure all CI checks are green (currently latest revision shows a lint issue). Thanks!

@john-parton
Copy link
Contributor Author

Thanks, this is really great info!

I'll get the linter issue fixed. I did add some docs with my last push.

@john-parton
Copy link
Contributor Author

I've pushed the linter fix. Let me know if there's anything else I need to do.

@john-parton
Copy link
Contributor Author

Looks like everything is still green on the linter, tests.

Is there anything else I need to do?

@nessita
Copy link
Contributor

nessita commented Mar 4, 2024

Looks like everything is still green on the linter, tests.

Is there anything else I need to do?

Hey @john-parton, I don't think there is anything else that you need to do. This work is correctly listed in the "branches needing review" section of the Django Developer Dahsboard, so reviewers will get to it soon, as soon as we can!

Thank you for your patience.

@john-parton
Copy link
Contributor Author

john-parton commented Apr 10, 2024

I understand that "soon" is a relative term, but given that it's been an additional month, are there any plans to look at this?

I'm happy to revise/iterate as necessary.

Thanks.

Edit

There was some conversation on discord, so here's a little 'to-do' for myself:

  • Minimize diff more "I noticed for example that you renamed the file variable to field_file but didn't change the behavior otherwise. While the new name might be more appropriate, there's also a lot of value (from a maintainer's perspective) in things staying as they are."
  • Remove historical background info from comments " your comments are thorough, but they should describe how/why the code works currently, not how it compares to what came before"
  • Add releasenote (in addition to versionchanged) for 5.1
  • Add documentation on how to explicitly opt-in to old "extra read" behavior for people who use compressors and care about such things

We also need to figure out if this really is a bug, or whether it's a behavior change.

@john-parton
Copy link
Contributor Author

So, there were some things that I really didn't like about this approach.

The biggest issue was that a value had to be wrapped in a special container and "passed through" multiple layers of abstraction. That sort of told me there was something really "smelly" going on.

I have another pull request here #18070 which also solved 8307, but it does it in a way that doesn't require that extra wrapper class. It moves the logic for setting width/height to the descriptor, and then changes the logic directly on the descriptor. Then it's only necessary to side-step the descriptor in the specific case that the storage backend or "get_filename" changes the name of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants