Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Record.__getitem__ / remove _bound_data #1158

Open
nixjdm opened this issue Aug 2, 2023 · 2 comments
Open

Simplify Record.__getitem__ / remove _bound_data #1158

nixjdm opened this issue Aug 2, 2023 · 2 comments

Comments

@nixjdm
Copy link
Member

nixjdm commented Aug 2, 2023

The Record class provides a handy use of __getitem__, but it's implementation introduces a complexity that does not need to be present. The method looks like this:

lektor/lektor/db.py

Lines 473 to 481 in bfce91a

def __getitem__(self, name):
rv = self._bound_data.get(name, Ellipsis)
if rv is not Ellipsis:
return rv
rv = self._data[name]
if hasattr(rv, "__get__"):
rv = rv.__get__(self)
self._bound_data[name] = rv
return rv

Apart from being initialized, this _bound_data is only used here. FlowBlock has a similar implmentation.

As implemented, _bound_data provides a bit of a short circuit, and a record of what was accessed previously. The short circuit does not seem necessary to me, and I am unaware of any real use of distinguishing between _data and _bound_data. I even struggle to think of a contrived one, beyond a possible efficiency gain by having the short circuit. So I'd like to clean this up a bit and remove the concept of _bound_data in Record and FlowBlock.

Why not leave well enough alone?

I actually have a plugin that would attempt to alter a Record of FlowBlock's underlying data, but this is made much more convoluted when I try to account for _bound_data. I was inclined for a while to prefer handling that complexity in the plugin, but the more I linger on the topic, the more I'm thinking unnecessary complexity should probably be removed upstream if possible, especially in db.py. I intend to PR with a simplification, but I thought I'd post first to check that someone was opposed to the change. So far as I can tell, this is unused in the project and in any plugin I've checked, and it's a private attribute anyway.

@dairiki
Copy link
Contributor

dairiki commented Aug 2, 2023

The reason for _bound_data is to cache the results of calling the __get__ methods of "descriptor-style" field values.

Some data values need access to the record they are attached to. The data in _data is initialized before the record is constructed. To deal with this, there is support for "descriptor-style" data: if a value in _data has a __get__ method (a la a Python method descriptor), it is called to bind the value to the record.

I think the thinking is that the call to __get__ may potentially be an expensive computation, so the result is cached in _bound_data.

The only descriptor-valued types implemented in Lektor itself are markdown and flow. In both cases, their __bind__ method is not particularly heavy — so it's not clear the caching really helps much in those cases.
But plugins may introduce descriptor types, too. E.g. (from a quick grep) lektor-rst, lektor-git-timestamp, lektor-expression-type, lektor-data-pages, lektor-polymorphic-type, and lektor-jupyter all do so. (Admittedly, half of those are plugins that I wrote.)
GitTimestampDescriptor.__get__ actually does some computation in its body, so caching is probably worthwhile in that case. (Of course, the caching logic could be moved to GitTimestampDescriptor...)

I actually have a plugin that would attempt to alter a Record of FlowBlock's underlying data, but this is made much more convoluted when I try to account for _bound_data.

An alternative way around that would be to implement a Record.__setitem__ (and FlowBlow.__setitem__) that would store the new value in _data and clear any cached value in _bound_data.

@nixjdm
Copy link
Member Author

nixjdm commented Aug 3, 2023

But plugins may introduce descriptor types, too. E.g. (from a quick grep) lektor-rst, lektor-git-timestamp, lektor-expression-type, lektor-data-pages, lektor-polymorphic-type, and lektor-jupyter all do so. (Admittedly, half of those are plugins that I wrote.)

Ah yes, of course. I hadn't considered this. So I want to ask then, "is the speedup relevant in any of these?" but seeing as new types are regularly introduced, the cautious thing to do is to assume the caching matters.

I had considered a __setitem__, but before considering plugin-introduced-types, it didn't seem worth it. It seemed like this was an opportunity to reduce complexity. I'll dig back into this approach. I think it did solve my issue when I tried that a while back. Thank you for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants