Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[join-] load all sheets before joins #2351

Merged
merged 3 commits into from May 24, 2024
Merged

Conversation

midichef
Copy link
Contributor

When joins are done on sheets that are loading, rows can be lost in the new sheet.

repro: seq 10000000 > 10m.tsv; vd 10m.tsv sample_data/test.fixed
While the larger sheet is loading, go to the Sheets sheet with S, quickly select both sheets, & to join, then append. The final sheet should have 10,000,002 rows, but if the join is started quickly enough, it won't.

The requirements for rows to be lost here are:

  1. a sheet is still loading when the join is done
  2. the slow-loading sheet is on screen
  3. the off-screen sheets finish loading before the on-screen sheet does

The problem is how we use BaseSheet.ensureLoaded(). We currently use its return value to decide whether a sheet has finished loading. But we can only do that if it hasn't been called for that sheet before. If it has been called before, it will return None even when the sheet is still loading.

For the sheet on screen, ensureLoaded() is called before, by drawSheet() in mainloop.py. So we cannot use its return value in join.py to guide the decision to call sync().

This PR changes the sync() to be done unconditionally.
It also fixes errors calculating progress while the number of final rows is not yet known (dd1e3fe).
And adds a loading Progress indicator during all the syncs.

When the currently displayed sheet is still loading, ensureLoaded
returns None. So we should not use its return value to decide whether
to sync.
@midichef
Copy link
Contributor Author

Is there any way to test whether a given sheet is fully loaded? Right now I only know of three tests that relate to loading state:

  1. sheet.rows is UNLOADED, when loading has not started yet
  2. the return value of sheet.ensureLoaded()
  3. inside with visidata.ScopedSetattr(self, 'loading', True)

Is it part of visidata's design to have such a state as "fully loaded" that a sheet can be checked for? Or is it a design choice that sheets may not have such a state?

@saulpw
Copy link
Owner

saulpw commented Mar 18, 2024

It's an implicit design choice, I guess, because some sheets may never finish loading. We could make it so that if loading is False and rows is not UNLOADED then that must mean it's finished loading. But rather than providing a way to wait until loading is finished, I think it'd be better tofail and let people manually wait or cancel the loading.

Copy link
Owner

@saulpw saulpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, @midichef, this is safer. Thanks for addressing it.

@anjakefala anjakefala merged commit 38e3137 into saulpw:develop May 24, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants