Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change RepoPaths to be acquired via RepoPathCache #3284

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jwhitley
Copy link
Contributor

@jwhitley jwhitley commented Jan 30, 2024

PR Description

In order to optimize the number of git calls made this change implements a RepoPathCache as the API by which RepoPaths instances are acquired. This primarily affects app startup and worktree enumeration.

This introduces a new dependency on go-memoize, which is a lightweight wrapper around go-cache and singleflight, in order to ensure that the cache is concurrency safe. (As compared to a simple map, e.g.) See the go-memoize README for details.

Fixes #3227.

Please check if the PR fulfills these requirements

  • Cheatsheets are up-to-date (run go generate ./...)
  • Code has been formatted (see here)
  • Tests have been added/updated (see here for the integration test guide)
  • Text is internationalised (see here)
  • Docs (specifically docs/Config.md) have been updated if necessary
  • You've read through your own file changes for silly mistakes etc

@jwhitley
Copy link
Contributor Author

jwhitley commented Jan 30, 2024

This PR is WIP since there's an annoying panic in worktree_loader_test.go, almost certainly related to how I'm setting up GitCommon's RepoPathCache instance for testing. (EDIT: now fixed, ready for review.) Other unit tests pass, as do all integration tests. I didn't implement a separate unit or integration test related to repo_path_cache.go. Virtually every existing integration test will hit the cache API because it's so central. Likewise, a unit test didn't seem relevant, but I'm open to discussion on that point. (EDIT: Of note, worktree_loader_test.go already exercises this code. See updates to that test file in this PR.)

@jwhitley jwhitley force-pushed the jwhitley/startup-optimization branch from e914750 to b70a140 Compare January 30, 2024 01:15
@jwhitley jwhitley marked this pull request as draft January 30, 2024 02:42
@jwhitley jwhitley changed the title WIP: Change RepoPaths to be acquired via RepoPathCache Change RepoPaths to be acquired via RepoPathCache Jan 30, 2024
@jwhitley jwhitley force-pushed the jwhitley/startup-optimization branch from b70a140 to 414f301 Compare January 30, 2024 19:09
@jwhitley jwhitley marked this pull request as ready for review January 30, 2024 19:11
@jwhitley
Copy link
Contributor Author

Bonus points: the second commit adds a new VSCode launch config that one-click debugs the unit test in the current tab.

In order to optimize the number of git calls made, particularly at
startup time, this change implements a RepoPathCache as the API by which
RepoPaths instances are acquired.
@jwhitley jwhitley force-pushed the jwhitley/startup-optimization branch from 414f301 to 16f3941 Compare January 30, 2024 19:23
@stefanhaller
Copy link
Collaborator

Bonus points: the second commit adds a new VSCode launch config that one-click debugs the unit test in the current tab.

I recommend to use VS Code's builtin test browser for that, it's pretty good. You don't need a launch config for this. It should discover go tests automatically (at least I don't remember having to configure anything).

Running the test under the cursor can be done with Command-; C, debugging the test under the cursor with Command-; Command-C.

@stefanhaller
Copy link
Collaborator

Hm, don't you have the go extension installed, by any chance? It shows code lenses for running and debugging tests, I find it hard to believe that you missed those.
image

@jwhitley
Copy link
Contributor Author

Hm, don't you have the go extension installed, by any chance? It shows code lenses for running and debugging tests, I find it hard to believe that you missed those.

lol, I absolutely missed those. 🙄 In my defense, my web searching for debugging go unit tests in VSCode didn't point me at either the codelenses OR the hindsight-super-obvious "look in the command palette!", always towards launch config stuff. Also, debugging integration tests previously via the launch config pointed me in exactly the wrong direction... ah well.

Shall I just drop that commit, since it's redundant?

@stefanhaller
Copy link
Collaborator

Shall I just drop that commit, since it's redundant?

Yes, I would say so. The debug configurations menu is already too crowded (I'm also not sure I'm happy with the recent change of unhiding some configs that you normally don't use on their own).

@jwhitley jwhitley force-pushed the jwhitley/startup-optimization branch from 16f3941 to 18f6fa9 Compare January 30, 2024 21:21
@jwhitley
Copy link
Contributor Author

Thanks Stefan. Redundant debug config dropped.

@jwhitley
Copy link
Contributor Author

jwhitley commented Feb 6, 2024

@jesseduffield Any feedback on this PR?

Copy link
Owner

@jesseduffield jesseduffield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwhitley sorry I've been busy lately so haven't had much time for reviewing. Looking at this code, it seems we're doing two things:

  1. obtaining RepoPaths early in the startup process that we only need to call git rev-parse once during startup
  2. caching all calls to GetRepoPaths globally based on the directory

These are two separate things and although I think the first part is great because it reduces startup time, I'm less sure about the second part. Caching the calls will speed things up, but at the cost of code complexity and the cache potentially becoming stale. How it could become stale, I'm not actually sure. Maybe the user tweaks some stuff with a worktree outside of lazygit and then expects the RepoPaths for that worktree to be reflected when they come back to lazygit and refresh? What do you think?

Also keen to get @stefanhaller 's thoughts

@@ -163,7 +163,7 @@ func NewGitCommandAux(
Bisect: bisectCommands,
WorkingTree: workingTreeCommands,
Worktree: worktreeCommands,
Version: version,
Version: repoPathCache.GetGitVersion(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exposing GetGitVersion on repoPathCache is a little too opportunistic for my liking: the fact that repo paths knows about the git version doesn't mean we should rely on it for obtaining the git version. I would instead pass that through to NewGitCommand explicitly as we did before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. To be honest, I found a bit of "architectural pull" towards something which is really a "Git service", but explicitly tried to avoid in this PR(**). Looks like some of that slipped through here.

**: Use cases that motivate that are things like this comment from Stefan, specifically needing to cd into the repo consistently as part of a git operation. One could envision a first class GitRepo entity (basically what we know as RepoPaths), but with an upgraded API to consolidate repo handling logic. The code logic shifts to asking "perform this git operation in this Repo". The cache is essentially an implementation detail of the repo accessor API at that point. I'm specifically not saying we should go there at this time, just that it's something to chew on if/when more use cases accumulate that support such a direction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment was specific to those few submodule operations though, and those can be dealt with locally in pkg/commands/git_commands/submodule.go (I'll have some code to look at shortly). I don't see where else we have the need to perform a git operation in a repo other than the currently open one.

@stefanhaller
Copy link
Collaborator

Also keen to get @stefanhaller 's thoughts

I don't have much of an opinion here, as I haven't been following the prior work in this area very closely, and I'm also very unfamiliar with worktrees as I don't use them.

@stefanhaller
Copy link
Collaborator

@jwhitley sorry I've been busy lately so haven't had much time for reviewing. Looking at this code, it seems we're doing two things:

  1. obtaining RepoPaths early in the startup process that we only need to call git rev-parse once during startup
  2. caching all calls to GetRepoPaths globally based on the directory

These are two separate things and although I think the first part is great because it reduces startup time, I'm less sure about the second part.

Discussed in person: Jesse and I agree that 1. is useful, but 2. is too risky and maybe not useful enough. @jwhitley, would you be ok with dropping the cache again?

@jwhitley
Copy link
Contributor Author

I'd be fine with that. I'd like to propose another option for consideration, first. go-memoize allows automatic time-based cache expiry. I didn't use that as my starting point, but a more conservative approach might be:

  • Use automatic cache expiry with a time (if we could agree that such exists) that's "long enough to be useful, short enough to be safe".
  • Optionally, allow that time to be configured by the user.

I've also thought about using explicit cache invalidation, but haven't been clear where or when that would be useful. E.g. as a user exposed "reload" command is the best I've come up with, given that the cases that invalidate this data are pretty much wholly external to lazygit.

Thoughts?

@stefanhaller
Copy link
Collaborator

I really think none of this is worth it. It's great that we reduce the startup time by combining several git rev-parse calls to one; but that's all we need to do. The term "cache invalidation" alone makes my alarm bells go off; the only benefit of this cache would be that we reduce the time to switch back and forth between two repos by 10ms, which just isn't worth it.

@jwhitley
Copy link
Contributor Author

Fair points. I'll change this to just eliminate the extra call per previous discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Follow-on consolidation of git rev-parse calls
3 participants