Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[misc-] add sync to ensureLoaded in guide, macros, sysedit, vdx #2352

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

midichef
Copy link
Contributor

These patches add sync() to ensureLoaded(). I don't think any of these patched lines were currently causing significant problems.

The lack of sync() in CommandHelpGetter.__init__() will likely cause errors when guides are used more. It is similar to #2338. So I've put it in its own commit. The other places that were missing sync() are unlikely to ever cause noticeable problems, but I've patched them for completeness sake.

I've looked at all other calls to ensureLoaded() in the visidata codebase, and they do not need any extra sync().

@saulpw
Copy link
Owner

saulpw commented Mar 18, 2024

Hey @midichef, thanks for doing this audit. I agree, we can sync in syseditCells_async because it's an asyncthread function. And it's fine in runvdx because it's only called for tests (but I guess they weren't ever failing?) But we shouldn't call sync in loadMacros because it's called before VisiData starts running--so users with a lot of macros will have a slow startup time, even though loading macros could be done in the background, and would only cause an issue if they tried to use a macro before it finished loading. But then we should address that special case instead of causing slow startup in every case.

What's the rubric you used to evaluate when sync should be used with ensureLoaded? That seems like a useful tidbit to including in some dev docs somewhere.

@midichef
Copy link
Contributor Author

Okay, that makes sense. Startup time is important and we would not want to delay that. And, on closer inspection, the sync for macros was not needed. runMacro() already had a sync, inside moveToReplayContext(). So I've taken that sync out.

@midichef
Copy link
Contributor Author

midichef commented Mar 21, 2024

Hmm, I spoke a bit too soon, there still is a need for a sync. The sync in runMacro() only happens if the macro file has loaded at least one line. In that situation, replay will cause the file to finish loading when it tries to replay the first command: runMacro() -> replay_sync() -> replayOne(cmdlog.cursorRow) -> moveToReplayContext() -> vd.sync(). But if no lines have been loaded yet, there won't be any sync.

I've demonstrated this by putting a delay in the very first line of CommandLogJsonl.iterload(). While the delay is occurring, the macro completes "successfully" by doing nothing, as if it's an empty file.

Should replay_sync() do a sync as its first line, to let the cmdlog finish loading?

This handles the case where a macro is run
before any of its lines have been loaded.
@midichef
Copy link
Contributor Author

midichef commented Mar 21, 2024

What's the rubric you used to evaluate when sync should be used with ensureLoaded? That seems like a useful tidbit to including in some dev docs somewhere.

Okay, for future reference, the rubric I used is:
When data is loaded by ensureLoaded(), does the current thread, or any other thread, use the data being loaded?
If the data stays partially-loaded indefinitely, what happens? Will there be an exception? Will there be undesired behavior?

For example, in syseditCells_async, my reasoning is that perhaps an IndexError could occur at tempvs.rows[0] if the first row is not yet loaded:

while isinstance(tempvs, IndexSheet):
tempvs.ensureLoaded()
tempvs = tempvs.rows[0]

But in experimental/noahs_tapestry/tapestry.py, ensureLoaded is just loading a sheet for interactive use. No other thread will be using the data. So it's fine and does not need a sync.

And there are actually only three other times where ensureLoaded is called and does not have a subsequent sync: in IndexSheet.addRow(), vd.drawSheet(), and vd.push(). In all those cases, there's no thread that expects the sheet to be fully loaded. (vd.replay_sync() looks at first like it also lacks a sync, but in fact it has one, hidden inside replayOne() -> moveToReplayContext())

@saulpw
Copy link
Owner

saulpw commented Mar 21, 2024

Hmm, I spoke a bit too soon, there still is a need for a sync. The sync in runMacro() only happens if the macro file has loaded at least one line. In that situation, replay will cause the file to finish loading when it tries to replay the first command: runMacro() -> replay_sync() -> replayOne(cmdlog.cursorRow) -> moveToReplayContext() -> vd.sync(). But if no lines have been loaded yet, there won't be any sync.

I've demonstrated this by putting a delay in the very first line of CommandLogJsonl.iterload(). While the delay is occurring, the macro completes "successfully" by doing nothing, as if it's an empty file.

Should replay_sync() do a sync as its first line, to let the cmdlog finish loading?

Hm, that might work, but it might introduce a delay before running each macro. I think it might be better to fail the macro if it's not completely loaded. We could formalize some kind of .completely_loaded attribute (which would be set to True if loading finishes, that is, stops without being canceled); would that be useful in other contexts as well?

@midichef midichef mentioned this pull request Mar 27, 2024
Copy link
Owner

@saulpw saulpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a definitive improvment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants