New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load internal cask json v3 #16896
base: master
Are you sure you want to change the base?
Load internal cask json v3 #16896
Conversation
This is currently hidden behind the `HOMEBREW_INTERNAL_JSON_V3` environment variable. The new format should be equivalent functionally to the old one but it's just smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Again: avoiding the if/else might be nice if possible but fine to merge as-is. Thanks!
Library/Homebrew/test/support/fixtures/internal_tap_json/homebrew-cask/Casks/j/julia.rb
Outdated
Show resolved
Hide resolved
The ones in |
72f84ba
to
e1646a7
Compare
@@ -361,7 +362,6 @@ def to_h | |||
# @private | |||
def to_internal_api_hash | |||
api_hash = { | |||
"token" => token, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be included in the hash because it's already the key of each object. Not sure how I missed this before.
after do | ||
Homebrew::API::Formula.clear_cache | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already clear all caches after each test.
# TODO: Clean this up when we deprecate the current JSON API and move to the internal JSON v3. | ||
ruby_source_sha256 = json_cask.dig(:ruby_source_checksum, :sha256) | ||
ruby_source_sha256 ||= json_cask[:ruby_source_sha256] | ||
@ruby_source_checksum = { "sha256" => ruby_source_sha256 } | ||
@ruby_source_checksum = json_cask[:ruby_source_checksum] || # public JSON v2 | ||
json_cask[:ruby_source_sha256]&.then { { "sha256" => _1 } } # internal JSON v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up removing this TODO because we won't be able to remove this going forward. We need to keep backwards compatibility with all of the JSON cask files that get stored when casks get installed from the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep backwards compatibility with all of the JSON cask files that get stored when casks get installed from the API.
@apainintheneck Oof. That seems undesirable (but clearly not your fault). Could we add some sort of versioning to these files/filenames so this isn't going to have to be the case indefinitely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll have to keep backwards compatibility until all of the currently supported macOS versions reach end of life. It will be a while. I'm not sure how we handle very long term deprecations like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apainintheneck That's not tenable given how recently this was introduced and that we have e.g. already two paths for this (writing formula files, writing JSON files). Instead, we should handle when this JSON file is in an invalid format and act the same as if it were missing entirely (which is something that already works).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that is another option. We don't even need to explicitly handle it being in an invalid format since as you already said we handle that already. I guess in that case it shouldn't be that big of a deal.
brew/Library/Homebrew/cask/installer.rb
Lines 576 to 593 in 7b2bfee
# load the same cask file that was used for installation, if possible | |
def load_installed_caskfile! | |
Migrator.migrate_if_needed(@cask) | |
installed_caskfile = @cask.installed_caskfile | |
if installed_caskfile&.exist? | |
begin | |
@cask = CaskLoader.load(installed_caskfile) | |
return | |
rescue CaskInvalidError | |
# could be caused by trying to load outdated caskfile | |
end | |
end | |
load_cask_from_source_api! if @cask.loaded_from_api? && @cask.caskfile_only? | |
# otherwise we default to the current cask | |
end |
We already fall back to the current cask file whenever the old cask file can't be loaded. I still think backwards compatibility is cheap here though and I don't understand why we would want to break it. Breaking backwards compatibility here could potentially affect all casks installed in the last year from the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we definitely shouldn't do this.
What is "this", sorry?
It means that you'd only be able to load a cask for any of the commands that require that if it's currently available in a tap.
This seems reasonable but I don't care enough to argue about it any more if you feel strongly about this.
The alternative is: write a JSON file into the Caskroom in the same format whether installing from homebrew-cask, other official taps or a 3rd-party tap. Again: the status quo of "install caskfile sometimes and JSON file other times" is not a good one.
Is the compatibility code really that bad?
If it was all neatly sorted into methods or a class/classes named accordingly: no big deal. As-is: it's going to be hard to extract later.
The easiest way to avoid that branching would be to have more conservative changes to the JSON layout
Again: the more conservative the approaches to changing the JSON, the less it seems worthy of having an internal JSON and/or a v3 JSON at all.
To be clear: I think the v3 JSON is very desirable and the internal one is: we just need to do it right. This is an API format that may still be with us in 5-10 years.
I do like the idea of transforming the old JSON files into the expected format but at the same time that just seems like moving the compatibility code somewhere else in the codebase.
If it's in a single location that's easier to delete/remove/deprecate later: that's better than having it inline.
See the OS layer as an example: it makes it much easier than just having if OS.mac?
scattered all over the place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we definitely shouldn't do this.
What is "this", sorry?
My bad. I was talking about the idea of having a reduced info cask JSON that could only be used to uninstall casks.
It means that you'd only be able to load a cask for any of the commands that require that if it's currently available in a tap.
This seems reasonable but I don't care enough to argue about it any more if you feel strongly about this.
The alternative is: write a JSON file into the Caskroom in the same format whether installing from homebrew-cask, other official taps or a 3rd-party tap. Again: the status quo of "install caskfile sometimes and JSON file other times" is not a good one.
The cask JSON doesn't support all of the features found in the cask Ruby file right now so I don't think we'll be able to do that unless we change how the Ruby file works.
Is the compatibility code really that bad?
If it was all neatly sorted into methods or a class/classes named accordingly: no big deal. As-is: it's going to be hard to extract later.
Fair point. 👍
The easiest way to avoid that branching would be to have more conservative changes to the JSON layout
Again: the more conservative the approaches to changing the JSON, the less it seems worthy of having an internal JSON and/or a v3 JSON at all.
To be clear: I think the v3 JSON is very desirable and the internal one is: we just need to do it right. This is an API format that may still be with us in 5-10 years.
It'll still be smaller and more efficient just not as pretty.
I do like the idea of transforming the old JSON files into the expected format but at the same time that just seems like moving the compatibility code somewhere else in the codebase.
If it's in a single location that's easier to delete/remove/deprecate later: that's better than having it inline.
See the OS layer as an example: it makes it much easier than just having
if OS.mac?
scattered all over the place.
That makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I was talking about the idea of having a reduced info cask JSON that could only be used to uninstall casks.
Gotcha. You've convinced me, thanks for taking the time to do so ❤️
The cask JSON doesn't support all of the features found in the cask Ruby file right now so I don't think we'll be able to do that unless we change how the Ruby file works.
What regressions have we caused to users through not writing cask files to the Caskroom? If there are (m)any: have there been (m)any user reports about this (given that most users have been using this for over a year)?
(implication being: few/zero user complaints about this means that this is not actually a meaningful problem for users worth blocking moving everyone to this path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What regressions have we caused to users through not writing cask files to the Caskroom? If there are (m)any: have there been (m)any user reports about this (given that most users have been using this for over a year)?
(implication being: few/zero user complaints about this means that this is not actually a meaningful problem for users worth blocking moving everyone to this path)
This is related to the casks which are Cask::Cask#caskfile_only?
. They either have languages or *flight blocks. It hasn't caused any problems because we currently sidestep the problem completely and just download the original cask file (the Ruby one) whenever we install them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, forgot about that.
We should probably figure out how to add the necessary data to JSON v3.
I know, scope creep, sorry.
cask_options[:tap] = Tap.fetch(json_cask[:tap]) if json_cask[:tap].to_s.include?("/") | ||
cask_options[:tap] = if json_cask.key?(:tap) # public JSON v2 | ||
tap_value = json_cask.fetch(:tap) | ||
Tap.fetch(tap_value) if tap_value.to_s.include?("/") | ||
else # internal JSON v3 | ||
CoreCaskTap.instance | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just me trying to avoid using Homebrew::API.internal_json_v3?
since we'll want to remove that eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to add the tap string to the json_cask
value so that the cask gets saved with more information on install. It might make sense to include renames too and there might be something else I'm missing. This a cask-specific problem since we always store installed formulae with their source files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either of them are really necessary for uninstalls though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to figure out what the bare minimum is that we actually need stored here for e.g. uninstalls and just store that, probably as a new dedicated JSON file that bears no resemblance to public or private APIs and never store the cask file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to work on and have to maintain another format. We should be good as long as tap is included so maybe we should just add it in the be sure about things. I'm going to think a bit more about how we want to handle loading legacy cask JSON that didn't have the tap defined.
My first thought is that if we're loading from a JSON blob or file we should just use the defined tap if it exists.When loading from the API it should probably just default to the core cask tap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to work on and have to maintain another format.
I don't think it makes sense to store a bunch of data here that's not actually used and reuse/tweak an existing format when we already have two different "formats" (cask file vs. JSON on disk).
Note that this has been a pretty long-running issue in Cask due to writing too much here and breaking backwards compatibility in the past causing Casks to not be able to be uninstalled.
What I'm arguing for is essentially:
- reusing the internal JSON format for this is not a good idea and going to cause more issues in future (as it has in the past)
- the existing format contains far too much unnecessary data which has and will continue to cause issues due to the too-large surface area here that's too easy for us to break with deprecation/disables/removals
- having 2-3 paths for what files are stored in the Caskroom is bad when Formulae has only one
- less of a big deal but: having how Formulae and Casks behave so differently in this way is undesirable
Genuinely I apologise that I keep stumbling on things that end up blowing up the scope here, it's just that some of this stuff has been very painful over long periods of time in Homebrew and I'd rather we did it properly if we're going to introduce a new API for this.
Genuinely appreciate all your work here and elsewhere: thank you @apainintheneck ❤️
My first thought is that if we're loading from a JSON blob or file we should just use the defined tap if it exists.When loading from the API it should probably just default to the core cask tap.
This seems reasonable.
I can't see when, with the existing code, how JSON would be used for non-core cask taps though, right? It'll always use the Cask file I'd think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what would the one file type be in the case that we decided to only support one format for installed casks? We could download the original cask file whenever we install a cask and just store that. Then, we wouldn't have to support the installed JSON file at all. Or we could generate a JSON file for every cask install and only store the JSON representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what would the one file type be in the case that we decided to only support one format for installed casks?
A JSON file.
Or we could generate a JSON file for every cask install and only store the JSON representation.
A minimal JSON file that's a subset of internal JSON v3 containing only the information needed to e.g. uninstall a cask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new notes but happy to merge whenever you are and can be addressed in follow-up PRs.
It looks like there is still broad disagreement about which types of cask files should get stored during installation, how to prioritize backwards compatibility with legacy cask file formats and which shape any new formats should have going forward. For that reason, I will close this PR and revert the changes made in previous PRs to generate cask JSON v3. At some future point, or as this discussion progresses it might be worth revisiting this idea but now is not that time. |
Do we care at all about potentially having formula JSON v3 and cask JSON v2 at the same time? They don't technically rely on each other. |
@apainintheneck Please don't close this, I can assign this to myself and finish it off. If you'd rather someone else opened this PR and it includes your commit(s): I can do that and use that as building blocks for my own PR(s). |
Feel free to use this PR as a starting point. I don't currently see any path forward that doesn't require first merging in other changes to the codebase. By that time this PR will likely be more than a month out-of-date. There is also no agreement currently on what we want this to look like which is why it made sense for me to close this. I'd be interested to hear your point of view on what you think it will take to get this merged in. |
Thanks 👍🏻 |
brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?This is the follow-up to #16798 where we start loading the cask from the new JSON API. It ended up being quite similar in to what I did in #16638 a few weeks ago.