Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load internal json v3 #16638

Merged
merged 8 commits into from Feb 29, 2024

Conversation

apainintheneck
Copy link
Contributor

@apainintheneck apainintheneck commented Feb 11, 2024

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This is an early PR to start loading the internal JSON v3 representation added in #16541. The hope is that this will not only load faster than the previous one but also will reduce the amount of data that has to flow across the network and reduce the amount of code needed to turn the current JSON representation into the expected shape internally.

My first thought is that the amount of code needed to get this to work is way less than I originally thought. Of course, no tests so far so this will grow a bit.

@apainintheneck apainintheneck added the install from api Relates to API installs label Feb 11, 2024
@apainintheneck
Copy link
Contributor Author

Early benchmarks on my super old iMac show that this is noticeably faster. I'll try it on a newer machine at some point to get more relevant information. Note that HOMEBREW_INTERNAL_JSON_V3 just gets ignored on the master branch.

$ hyperfine --parameter-list branch master,load-internal-json-v3 --warmup 3 --setup 'git switch {branch}' 'HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl'
Benchmark 1: HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl (branch = master)
  Time (mean ± σ):      3.164 s ±  0.020 s    [User: 2.086 s, System: 0.915 s]
  Range (min … max):    3.133 s …  3.196 s    10 runs

Benchmark 2: HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl (branch = load-internal-json-v3)
  Time (mean ± σ):      2.731 s ±  0.020 s    [User: 1.694 s, System: 0.877 s]
  Range (min … max):    2.710 s …  2.783 s    10 runs

Summary
  HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl (branch = load-internal-json-v3) ran
    1.16 ± 0.01 times faster than HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl (branch = master)

@MikeMcQuaid
Copy link
Member

Early benchmarks on my super old iMac show that this is noticeably faster. I'll try it on a newer machine at some point to get more relevant information. Note that HOMEBREW_INTERNAL_JSON_V3 just gets ignored on the master branch.

Wow, that's huge! Great work @apainintheneck!

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!

@apainintheneck apainintheneck force-pushed the load-internal-json-v3 branch 2 times, most recently from df2bc06 to 2a4c716 Compare February 19, 2024 03:56
@apainintheneck
Copy link
Contributor Author

apainintheneck commented Feb 19, 2024

Okay, I finally got around to adding some tests. There are tests for both generating the JSON format and loading formulae from the JSON file. I ended up creating a mini-tap fixture to facilitate testing as well. I assume grabbing a few random formulae for that is not a big deal. Let me know if there's anything else that would be good to test for.

Edit: I guess I'll be debugging the cross-platform test failures for a bit...

@apainintheneck apainintheneck force-pushed the load-internal-json-v3 branch 2 times, most recently from a155b13 to c0ed1dc Compare February 19, 2024 05:41
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @apainintheneck 👏🏻

Comment on lines +96 to +112
"uses_from_macos" => [{ "llvm"=>[:build, :test] }, "zlib"],
"uses_from_macos_bounds" => [{}, {}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to increase up scope here potentially (and can be in another PR) but: this API is fairly awful due to backwards compatibility concerns we don't have for v3 so would be nice to address. CC @Bo98 for thoughts on what that should look like.

Copy link
Member

@Bo98 Bo98 Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we were going to cycle back and clean this up. I can open a PR with suggestions (probably easier after merging this PR though)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool: given that: fine to merge with this as-is in this PR 👍🏻. A TODO or something might be nice though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a TODO be added here please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a note in the original issue thread: #16410 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to leave a comment when any potential changes are exploratory. We haven't decided on the form that we want yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck It's a hard TODO because we're not going to ship the internal JSON v3 without changing this. I'm fine on not blocking the PR on this stuff but I would really like something beyond a comment in another issue to track this in the actual code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"uses_from_macos" => [{ "llvm"=>[:build, :test] }, "zlib"],
"uses_from_macos_bounds" => [{}, {}],
# TODO: improve this API before we ship internal API v3 to users
"uses_from_macos" => [{ "llvm"=>[:build, :test] }, "zlib"],
"uses_from_macos_bounds" => [{}, {}],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck It's a hard TODO because we're not going to ship the internal JSON v3 without changing this. I'm fine on not blocking the PR on this stuff but I would really like something beyond a comment in another issue to track this in the actual code.

I'm fine with the comment but this is news to me. I didn't know that this was that important either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for my unclear communication! I'm trying to get the balance right between being overly pedantic/demotivating and blocking individual PRs vs. making clear what's required before we ship to users. Great work getting this shipped, thanks @apainintheneck ❤️

"tap_git_head" => tap_git_head,
"uses_from_macos" => [{ "llvm"=>[:build, :test] }, "zlib"],
"uses_from_macos_bounds" => [{}, {}],
"versions" => { "bottle"=>true, "head"=>nil, "stable"=>"0.58.1" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"versions" => { "bottle"=>true, "head"=>nil, "stable"=>"0.58.1" },
"versions" => { "bottle"=>true, "stable"=>"0.58.1" },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can probably compact or avoid adding that field in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, this only exists in the public facing JSON and should already be omitted from the internal one. I'm just using the public JSON here for testing because it exercises more parts of the formula code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, I think it'd be preferable to use the internal JSON and to get full coverage of the formula code through the input JSON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, I think it'd be preferable to use the internal JSON and to get full coverage of the formula code through the input JSON.

Sorry, I don't know what you mean by this. How can we get full coverage of the formula code through the input JSON? What input JSON are we referring to here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying it seems very weird to be to use the non-internal public JSON for testing the internal JSON format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that weird. It essentially mimics brew readall.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck Ok but aren't these JSON formats going to diverge significantly?

I think I might just need an ELI5 treatment here, sorry.

Copy link
Member

@Bo98 Bo98 Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically a fancy to_hash test. We could do it as a series of checks instead, though would be multiple expects in a single check so might be long:

expect(formula.name).to eq("something")
expect(formula.version.to eq("1.0")
expect(formula.revision).to be_zero
# ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#to_hash is a bit better than doing a bunch of individual checks since it will end up calling more methods internally to fill out the fields we're not checking.

The goal here is not to validate the #to_hash method but instead to be reasonably confident that the formula loads without failing. #to_hash is just an easy way to take a peek inside the formula instance and run some arbitrary methods along the way. I'm open to alternatives that would be clearer, briefer or provide more assurance of correctness. The current test is not comprehensive at all.

#to_api_hash is already being exercised by a different test where we generate the JSON API for the core formula tap.

@apainintheneck apainintheneck force-pushed the load-internal-json-v3 branch 2 times, most recently from 064cdf9 to 078c46a Compare February 19, 2024 21:37
@apainintheneck apainintheneck force-pushed the load-internal-json-v3 branch 3 times, most recently from 9134e7e to 6834f2a Compare February 25, 2024 09:57
@apainintheneck apainintheneck changed the title [WIP] Load internal json v3 Load internal json v3 Feb 25, 2024
@apainintheneck
Copy link
Contributor Author

This should be ready for review again. The test failures were caused by caching problems between test runs. I just needed to clear the caches a bit more between them and add one more variable to the Tap.clear_cache method. #16710 solved some of those problems though which was helpful. #16671 could help solve these sorts of problems more in the future.

I'm leaving this as draft until after the Monday release since I'd like to be cautious and a have a few days before it's available to everyone to avoid any unexpected bugs.

Comment on lines +96 to +112
"uses_from_macos" => [{ "llvm"=>[:build, :test] }, "zlib"],
"uses_from_macos_bounds" => [{}, {}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a TODO be added here please.

"tap_git_head" => tap_git_head,
"uses_from_macos" => [{ "llvm"=>[:build, :test] }, "zlib"],
"uses_from_macos_bounds" => [{}, {}],
"versions" => { "bottle"=>true, "head"=>nil, "stable"=>"0.58.1" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying it seems very weird to be to use the non-internal public JSON for testing the internal JSON format?

@apainintheneck apainintheneck marked this pull request as ready for review February 27, 2024 04:30
@apainintheneck
Copy link
Contributor Author

Here are the same benchmarks on my M1 MacBook that I use at work. Still significant performance boost though not as big as the one on my older computer. Percentage-wise it's more significant though.

$ hyperfine --parameter-list branch master,load-internal-json-v3 --warmup 3 --setup 'git switch {branch}' 'HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl'
Benchmark 1: HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl (branch = master)
  Time (mean ± σ):      1.079 s ±  0.010 s    [User: 0.738 s, System: 0.164 s]
  Range (min … max):    1.070 s …  1.096 s    10 runs

Benchmark 2: HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl (branch = load-internal-json-v3)
  Time (mean ± σ):     876.8 ms ±  10.4 ms    [User: 554.8 ms, System: 146.4 ms]
  Range (min … max):   867.7 ms … 895.6 ms    10 runs

Summary
  HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl (branch = load-internal-json-v3) ran
    1.23 ± 0.02 times faster than HOMEBREW_INTERNAL_JSON_V3=1 brew info jrnl (branch = master)

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @apainintheneck. A couple of small requests but I'm fine if this is merged as-is and they are addressed in follow-up PRs.

Comment on lines +96 to +112
"uses_from_macos" => [{ "llvm"=>[:build, :test] }, "zlib"],
"uses_from_macos_bounds" => [{}, {}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck I'd really love an inline Ruby comment here if possible.

],
"version": "0.58.1",
"bottle": {
"files": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe strip this key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a result of trying to reuse the logic we currently have for building the bottle hash. The bottle hash will occasionally have another key if the rebuild number is non-zero as well.

hash["rebuild"] = bottle_spec.rebuild if !compact_for_api || !bottle_spec.rebuild.zero?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, that's fine then 👍🏻

This abstracts away this helper to make it easier to test and
reason about.
These tests cover both generating and loading formulae from the JSON
bundle. The tests are not comprehensive but they do provide a nice
sanity check that things are working as expected.
- clear the formula API cache
- make the API cache directory
- fix stubbed return values (thanks Sorbet!)
@apainintheneck
Copy link
Contributor Author

I'll merge this in tomorrow.

@apainintheneck apainintheneck merged commit 154a217 into Homebrew:master Feb 29, 2024
26 checks passed
@apainintheneck apainintheneck mentioned this pull request Mar 15, 2024
7 tasks
@github-actions github-actions bot added the outdated PR was locked due to age label Apr 3, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
install from api Relates to API installs outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants