New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Slim down formula API JSON #16433
WIP: Slim down formula API JSON #16433
Conversation
@@ -74,6 +78,7 @@ def generate_formula_api | |||
raise | |||
end | |||
|
|||
File.write("formula.json", JSON.generate(api_json)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where this file should go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Location seems fine. Filename could be something like formula_internal.json
or formula_slim.json
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe they should both go in the same directory? internal/formula.json
and internal/cask.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @apainintheneck. A few ideas here; I don't expect them all to block the PR but I think this work is exposing the legacy mess that is our existing JSON format 😅
@@ -101,6 +102,71 @@ def write_names_and_aliases(regenerate: false) | |||
end | |||
end | |||
end | |||
|
|||
REQUIRED_FIELDS = %w[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this blows up the scope a bit but: I'd instead be tempted to actually have a new Formula#to_hash
(and Cask equivalent) or pass a versioned argument there and generate an entirely new hash format with e.g. less nesting/repetition that better meets our needs.
May be a good time to actually have some sort of stricter versioning or e.g. specifying a minimum Homebrew version in the JSON itself here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be ideal but I'm a bit worried about backwards compatibility. I'm not sure how many people depend on the output of brew info --json=v2
though. I guess that could just be json v3 and we could just keep the old format at json v2.
Also, changing the format entirely would add a lot of complexity to the load from API methods which rely on the current format. This is especially important for formula that get removed since we'll have to load them from the original JSON definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could have multiple loaders and just have a key like json_version
that specifies which one we should try and load it with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a sample of the current slimmed-down JSON representation. Let me know if there's anything that looks like it can be removed and/or simplified.
https://gist.github.com/apainintheneck/882bb75d29394f7bdd5f8b5efe482038
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"versions" -> "head"
and "versions" -> "bottle"
can probably be trimmed. "full_name"
and "tap"
too probably
"versions" -> "head"
is quite an interesting one. In theory we might support:
head do
url "..."
version "whatever"
end
In practice: does it actually work? Not that it's a concern for Homebrew/core, but maybe something we could make error on a Ruby DSL level if it behaves weirdly. If it just makes it behave like the old devel
spec then it's probably ok, even if weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in my mind a theoretical tap agnostic format would have been:
{
"tap": {
...,
"migrations": [...]
},
"formulae": [...],
"casks": [...]
}
Which is basically the same thing you suggest.
The tap_migrations JSON being a separate download is a hack indicative of this problem. All of this is backwards incompatible changes however. We're already going to break backwards compatibility here in terms of old brew versions (given 984dcf8), but this would be a compatibility break in terms of old cached or mirrored JSON data (though we can do what we did when moving to JWS if we want to go this direction).
Though a compatibility layer is also possible without too significant effort.
If we go this direction at some point there would also be room to revive the dependency redo code shelved for a theoretical v2 (which might(?) also help our goal of reducing overall object footprint as it deduplicates some things, but it also adds more subkeys so not necessarily): Bo98@25aec4b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand your comment about 984dcf8. Those changes should be backwards compatible.
I like your idea about the tap agnostic format. I think it should be possible to implement that in a backwards compatible way though. Basically, we could add additional logic to build the Homebrew::API::Formula.all_formulae
hash so that it stays the same as before minus the missing hash keys despite the changes to the formula.jws.json
file. Whether that's worth the effort is another question entirely. This would mean that the JSON files saved during the installation should hold the same amount of information as before.
What was the goal of the dependency redo code you mention here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we'd have two different branches of parsing logic for the old and new formula.jws.json
files but the same resulting Homebrew::API::Formula.all_formulae
hash if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand your comment about 984dcf8. Those changes should be backwards compatible.
I mean if we switch to generate JSONs that strip those previously not-nilable values then older brew versions can't read newer JSON (typically wasn't a concern for git taps, but API downloads can be detached from update.sh).
What was the goal of the dependency redo code you mention here?
Idea at the time was to make a formula that does:
depends_on "abc" => [:build, :test]
depends_on "def" => [:build, :test]
depends_on "ghi" => [:build, :test]
head do
depends_on "jkl" => [:build, :test]
end
which currently produces
"build_dependencies": [
"abc",
"def",
"ghi"
],
"test_dependencies": [
"abc",
"def",
"ghi"
],
"head_dependencies": [
"build_dependencies": [
"abc",
"def",
"ghi",
"jkl"
],
"test_dependencies": [
"abc",
"def",
"ghi",
"jkl"
],
]
be:
"deps": [
{
"name": "abc",
"tags": ["build", "test"]
},
{
"name": "def",
"tags": ["build", "test"]
},
{
"name": "ghi",
"tags": ["build", "test"]
},
{
"name": "jkl",
"tags": ["build", "test"],
"specs": ["head"]
}
]
but thinking more about more how the Ruby JSON parser works while it might make it physically smaller, it won't necessarily make it more performant so still very much a "to investigate" for the future.
So we'd have two different branches of parsing logic for the old and new
formula.jws.json
files but the same resultingHomebrew::API::Formula.all_formulae
hash if that makes sense.
Yeah that was what I was thinking yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that could just be json v3 and we could just keep the old format at json v2.
Yes, sorry, this was my thinking 👍🏻. Might even be something like vHomebrewCore
or something as well as a separate v3 which could have more information in a nicer structure.
That's a good example of something we can likely remove because it doesn't fit the core repo use case.
Yeh, this is my thinking: we remove anything that we won't use/need in core.
For example, the following fields all seem to default to 0 so we could just omit them in that case and add 0 manually when loading things.
...
Also at a glance the versions hash always seems to include"bottle": true
for core formulas anyway. Not sure if we need to include that.
These seem like good things to omit.
but this would be a compatibility break in terms of old cached or mirrored JSON data (though we can do what we did when moving to JWS if we want to go this direction).
I think we should aim to just force a new API download when we release a new enough Homebrew version. I can see an argument for a couple of point releases where we have branching logic here but it's a small enough download that needs refetched/updated often enough I don't think it's worth keeping around for long at all beyond testing/gradual rollout.
Whether that's worth the effort is another question entirely.
It is not 😁
@@ -74,6 +78,7 @@ def generate_formula_api | |||
raise | |||
end | |||
|
|||
File.write("formula.json", JSON.generate(api_json)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Location seems fine. Filename could be something like formula_internal.json
or formula_slim.json
or something?
# frozen_string_literal: true | ||
|
||
module Utils | ||
module Collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is expectation that this will be used for Casks too?
Obviously no rush yet but: good to have tests for this.
private_constant :REQUIRED_FIELDS | ||
|
||
sig { params(formula_hash: Hash).returns(Hash) } | ||
def slim_hash(formula_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again: don't need this yet but probably want to have tests ensuring that formulae are loaded as expected from this JSON.
Neither the recursive compacting of empty values or removing a few more keys really seems to move the needle much at all and adds a lot more complexity. Just doing the following gets us most of the intended gains. def api_hash
hash = to_hash_with_variations
if stable_bottle = hash.dig("bottle", "stable")
stable_bottle["files"] = stable_bottle["files"].transform_values do |file|
# The "url" is not used here and takes up a lot of space.
file.slice("cellar", "sha256")
end
end
hash.reject { |_, value| value.blank? }.to_h
end This gets us down from 24MB to 13MB and we could get down to 12MB which is essentially where we're at here in this PR but I hardly think it's worth the effort. The big difference maker is removing the bottle URL from the hash. If this isn't something we're overly attached to, we could just remove it entirely. It doesn't seem to get used internally anywhere. I think the simplest thing to do would be call this the |
24MB to 13MB is a huge difference, nice work!
I'm not opposed to removing it 👍🏻
Works for me 👍🏻 |
in the generated smaller json I see many
|
played a little - tried hjson and yaml
|
@muescha That's another interesting idea. As far as I can tell, the only reason to keep information about multiple bottles is to keep backwards compatibility with the current I took a stab at it and it looks like the size drops around another 5MB from the current reduced size. Down from 13MB to 7.4MB. |
Closing this because the PR has been superseded by this other one: #16460 |
And |
brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?Related to #16410
The idea here is to create a file with the slimmed down formula JSON that omits certain fields and removes nil values, empty hashes and empty arrays. It looks like the resulting file is around half the size.