Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Slim down formula API JSON #16433

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
66 changes: 66 additions & 0 deletions Library/Homebrew/api/formula.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

require "extend/cachable"
require "api/download"
require "utils/collection"

module Homebrew
module API
Expand Down Expand Up @@ -101,6 +102,71 @@ def write_names_and_aliases(regenerate: false)
end
end
end

REQUIRED_FIELDS = %w[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this blows up the scope a bit but: I'd instead be tempted to actually have a new Formula#to_hash (and Cask equivalent) or pass a versioned argument there and generate an entirely new hash format with e.g. less nesting/repetition that better meets our needs.

May be a good time to actually have some sort of stricter versioning or e.g. specifying a minimum Homebrew version in the JSON itself here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be ideal but I'm a bit worried about backwards compatibility. I'm not sure how many people depend on the output of brew info --json=v2 though. I guess that could just be json v3 and we could just keep the old format at json v2.

Also, changing the format entirely would add a lot of complexity to the load from API methods which rely on the current format. This is especially important for formula that get removed since we'll have to load them from the original JSON definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could have multiple loaders and just have a key like json_version that specifies which one we should try and load it with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a sample of the current slimmed-down JSON representation. Let me know if there's anything that looks like it can be removed and/or simplified.

https://gist.github.com/apainintheneck/882bb75d29394f7bdd5f8b5efe482038

Copy link
Member

@Bo98 Bo98 Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"versions" -> "head" and "versions" -> "bottle" can probably be trimmed. "full_name" and "tap" too probably

"versions" -> "head" is quite an interesting one. In theory we might support:

head do
  url "..."
  version "whatever"
end

In practice: does it actually work? Not that it's a concern for Homebrew/core, but maybe something we could make error on a Ruby DSL level if it behaves weirdly. If it just makes it behave like the old devel spec then it's probably ok, even if weird.

Copy link
Member

@Bo98 Bo98 Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in my mind a theoretical tap agnostic format would have been:

{
  "tap": {
    ...,
    "migrations": [...]
  },
  "formulae": [...],
  "casks": [...]
}

Which is basically the same thing you suggest.

The tap_migrations JSON being a separate download is a hack indicative of this problem. All of this is backwards incompatible changes however. We're already going to break backwards compatibility here in terms of old brew versions (given 984dcf8), but this would be a compatibility break in terms of old cached or mirrored JSON data (though we can do what we did when moving to JWS if we want to go this direction).

Though a compatibility layer is also possible without too significant effort.

If we go this direction at some point there would also be room to revive the dependency redo code shelved for a theoretical v2 (which might(?) also help our goal of reducing overall object footprint as it deduplicates some things, but it also adds more subkeys so not necessarily): Bo98@25aec4b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your comment about 984dcf8. Those changes should be backwards compatible.

I like your idea about the tap agnostic format. I think it should be possible to implement that in a backwards compatible way though. Basically, we could add additional logic to build the Homebrew::API::Formula.all_formulae hash so that it stays the same as before minus the missing hash keys despite the changes to the formula.jws.json file. Whether that's worth the effort is another question entirely. This would mean that the JSON files saved during the installation should hold the same amount of information as before.

What was the goal of the dependency redo code you mention here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'd have two different branches of parsing logic for the old and new formula.jws.json files but the same resulting Homebrew::API::Formula.all_formulae hash if that makes sense.

Copy link
Member

@Bo98 Bo98 Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your comment about 984dcf8. Those changes should be backwards compatible.

I mean if we switch to generate JSONs that strip those previously not-nilable values then older brew versions can't read newer JSON (typically wasn't a concern for git taps, but API downloads can be detached from update.sh).

What was the goal of the dependency redo code you mention here?

Idea at the time was to make a formula that does:

depends_on "abc" => [:build, :test]
depends_on "def" => [:build, :test]
depends_on "ghi" => [:build, :test]

head do
  depends_on "jkl" => [:build, :test]
end

which currently produces

"build_dependencies": [
  "abc",
   "def",
   "ghi"
],
"test_dependencies": [
  "abc",
   "def",
   "ghi"
],
"head_dependencies": [
  "build_dependencies": [
    "abc",
     "def",
     "ghi",
     "jkl"
  ],
  "test_dependencies": [
    "abc",
     "def",
     "ghi",
     "jkl"
  ],
]

be:

"deps": [
  {
    "name": "abc",
    "tags": ["build", "test"]
  },
  {
    "name": "def",
    "tags": ["build", "test"]
  },
  {
    "name": "ghi",
    "tags": ["build", "test"]
  },
  {
    "name": "jkl",
    "tags": ["build", "test"],
    "specs": ["head"]
  }
]

but thinking more about more how the Ruby JSON parser works while it might make it physically smaller, it won't necessarily make it more performant so still very much a "to investigate" for the future.

So we'd have two different branches of parsing logic for the old and new formula.jws.json files but the same resulting Homebrew::API::Formula.all_formulae hash if that makes sense.

Yeah that was what I was thinking yeah

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that could just be json v3 and we could just keep the old format at json v2.

Yes, sorry, this was my thinking 👍🏻. Might even be something like vHomebrewCore or something as well as a separate v3 which could have more information in a nicer structure.

That's a good example of something we can likely remove because it doesn't fit the core repo use case.

Yeh, this is my thinking: we remove anything that we won't use/need in core.

For example, the following fields all seem to default to 0 so we could just omit them in that case and add 0 manually when loading things.
...
Also at a glance the versions hash always seems to include "bottle": true for core formulas anyway. Not sure if we need to include that.

These seem like good things to omit.

but this would be a compatibility break in terms of old cached or mirrored JSON data (though we can do what we did when moving to JWS if we want to go this direction).

I think we should aim to just force a new API download when we release a new enough Homebrew version. I can see an argument for a couple of point releases where we have branching logic here but it's a small enough download that needs refetched/updated often enough I don't think it's worth keeping around for long at all beyond testing/gradual rollout.

Whether that's worth the effort is another question entirely.

It is not 😁

aliases
bottle
build_dependencies
caveats
conflicts_with
conflicts_with_reasons
dependencies
deprecation_date
deprecation_reason
desc
disable_date
disable_reason
full_name
homepage
keg_only_reason
license
link_overwrite
name
oldnames
optional_dependencies
options
post_install_defined
pour_bottle_only_if
recommended_dependencies
requirements
revision
ruby_source_checksum
ruby_source_path
service
tap
tap_git_head
test_dependencies
urls
uses_from_macos
uses_from_macos_bounds
variations
version_scheme
versioned_formulae
versions
].freeze
private_constant :REQUIRED_FIELDS

sig { params(formula_hash: Hash).returns(Hash) }
def slim_hash(formula_hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: don't need this yet but probably want to have tests ensuring that formulae are loaded as expected from this JSON.

hash = formula_hash.slice(*REQUIRED_FIELDS)

if stable_bottle = hash.dig("bottle", "stable")
stable_bottle["files"] = stable_bottle["files"].transform_values do |file|
# The "url" is not used here and takes up a lot of space.
file.slice("cellar", "sha256")
end
end

# This array can include empty hashes but they should not be omitted since
# array index position matters here. It corresponds to "uses_from_macos".
uses_from_macos_bounds = hash["uses_from_macos_bounds"]

hash = Utils::Collection.recursive_compact_hash(hash)
raise ArgumentError, "Invalid formula hash" if hash.nil?

hash["uses_from_macos_bounds"] = uses_from_macos_bounds if uses_from_macos_bounds
hash
end
end
end
end
Expand Down
7 changes: 6 additions & 1 deletion Library/Homebrew/dev-cmd/generate-formula-api.rb
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,14 @@ def generate_formula_api
Formulary.enable_factory_cache!
Formula.generating_hash!

api_json = []

tap.formula_names.each do |name|
formula = Formulary.factory(name)
name = formula.name
json = JSON.pretty_generate(formula.to_hash_with_variations)
formula_hash = formula.to_hash_with_variations
json = JSON.pretty_generate(formula_hash)
api_json << API::Formula.slim_hash(formula_hash)
html_template_name = html_template(name)

unless args.dry_run?
Expand All @@ -74,6 +78,7 @@ def generate_formula_api
raise
end

File.write("formula.json", JSON.generate(api_json))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where this file should go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Location seems fine. Filename could be something like formula_internal.json or formula_slim.json or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe they should both go in the same directory? internal/formula.json and internal/cask.json

canonical_json = JSON.pretty_generate(tap.formula_renames.merge(tap.alias_table))
File.write("_data/formula_canonical.json", "#{canonical_json}\n") unless args.dry_run?
end
Expand Down
31 changes: 31 additions & 0 deletions Library/Homebrew/utils/collection.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# typed: true
# frozen_string_literal: true

module Utils
module Collection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is expectation that this will be used for Casks too?

Obviously no rush yet but: good to have tests for this.

def self.recursive_compact(value)
case value
when Array
recursive_compact_array(value)
when Hash
recursive_compact_hash(value)
else
value
end
end

sig { params(array: Array).returns(T.nilable(Array)) }
def self.recursive_compact_array(array)
array.map do |value|
recursive_compact(value)
end.compact.presence
end

sig { params(hash: Hash).returns(T.nilable(Hash)) }
def self.recursive_compact_hash(hash)
hash.transform_values do |value|
recursive_compact(value)
end.compact.presence
end
end
end