Simplify Record.getitem / remove _bound_data #1158

nixjdm · 2023-08-02T22:16:02Z

The Record class provides a handy use of __getitem__, but it's implementation introduces a complexity that does not need to be present. The method looks like this:

lektor/lektor/db.py

Lines 473 to 481 in bfce91a

 def __getitem__(self, name): 

 rv = self._bound_data.get(name, Ellipsis) 

 if rv is not Ellipsis: 

 return rv 

 rv = self._data[name] 

 if hasattr(rv, "__get__"): 

 rv = rv.__get__(self) 

 self._bound_data[name] = rv 

 return rv

Apart from being initialized, this _bound_data is only used here. FlowBlock has a similar implmentation.

As implemented, _bound_data provides a bit of a short circuit, and a record of what was accessed previously. The short circuit does not seem necessary to me, and I am unaware of any real use of distinguishing between _data and _bound_data. I even struggle to think of a contrived one, beyond a possible efficiency gain by having the short circuit. So I'd like to clean this up a bit and remove the concept of _bound_data in Record and FlowBlock.

Why not leave well enough alone?

I actually have a plugin that would attempt to alter a Record of FlowBlock's underlying data, but this is made much more convoluted when I try to account for _bound_data. I was inclined for a while to prefer handling that complexity in the plugin, but the more I linger on the topic, the more I'm thinking unnecessary complexity should probably be removed upstream if possible, especially in db.py. I intend to PR with a simplification, but I thought I'd post first to check that someone was opposed to the change. So far as I can tell, this is unused in the project and in any plugin I've checked, and it's a private attribute anyway.

The text was updated successfully, but these errors were encountered:

dairiki · 2023-08-02T23:36:23Z

The reason for _bound_data is to cache the results of calling the __get__ methods of "descriptor-style" field values.

Some data values need access to the record they are attached to. The data in _data is initialized before the record is constructed. To deal with this, there is support for "descriptor-style" data: if a value in _data has a __get__ method (a la a Python method descriptor), it is called to bind the value to the record.

I think the thinking is that the call to __get__ may potentially be an expensive computation, so the result is cached in _bound_data.

The only descriptor-valued types implemented in Lektor itself are markdown and flow. In both cases, their __bind__ method is not particularly heavy — so it's not clear the caching really helps much in those cases.
But plugins may introduce descriptor types, too. E.g. (from a quick grep) lektor-rst, lektor-git-timestamp, lektor-expression-type, lektor-data-pages, lektor-polymorphic-type, and lektor-jupyter all do so. (Admittedly, half of those are plugins that I wrote.)
GitTimestampDescriptor.__get__ actually does some computation in its body, so caching is probably worthwhile in that case. (Of course, the caching logic could be moved to GitTimestampDescriptor...)

I actually have a plugin that would attempt to alter a Record of FlowBlock's underlying data, but this is made much more convoluted when I try to account for _bound_data.

An alternative way around that would be to implement a Record.__setitem__ (and FlowBlow.__setitem__) that would store the new value in _data and clear any cached value in _bound_data.

nixjdm · 2023-08-03T00:04:59Z

But plugins may introduce descriptor types, too. E.g. (from a quick grep) lektor-rst, lektor-git-timestamp, lektor-expression-type, lektor-data-pages, lektor-polymorphic-type, and lektor-jupyter all do so. (Admittedly, half of those are plugins that I wrote.)

Ah yes, of course. I hadn't considered this. So I want to ask then, "is the speedup relevant in any of these?" but seeing as new types are regularly introduced, the cautious thing to do is to assume the caching matters.

I had considered a __setitem__, but before considering plugin-introduced-types, it didn't seem worth it. It seemed like this was an opportunity to reduce complexity. I'll dig back into this approach. I think it did solve my issue when I tried that a while back. Thank you for the feedback!

nixjdm added the plugin-related label Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify Record.getitem / remove _bound_data #1158

Simplify Record.getitem / remove _bound_data #1158

nixjdm commented Aug 2, 2023

dairiki commented Aug 2, 2023

nixjdm commented Aug 3, 2023 •

edited

Simplify Record.__getitem__ / remove _bound_data #1158

Simplify Record.__getitem__ / remove _bound_data #1158

Comments

nixjdm commented Aug 2, 2023

dairiki commented Aug 2, 2023

nixjdm commented Aug 3, 2023 • edited

Simplify Record.getitem / remove _bound_data #1158

Simplify Record.getitem / remove _bound_data #1158

nixjdm commented Aug 3, 2023 •

edited