Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load internal cask json v3 #16896

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Conversation

apainintheneck
Copy link
Contributor

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This is the follow-up to #16798 where we start loading the cask from the new JSON API. It ended up being quite similar in to what I did in #16638 a few weeks ago.

This is currently hidden behind the `HOMEBREW_INTERNAL_JSON_V3`
environment variable. The new format should be equivalent functionally
to the old one but it's just smaller.
@apainintheneck apainintheneck added cask Homebrew Cask install from api Relates to API installs labels Mar 15, 2024
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Again: avoiding the if/else might be nice if possible but fine to merge as-is. Thanks!

@apainintheneck
Copy link
Contributor Author

Looks good! Again: avoiding the if/else might be nice if possible but fine to merge as-is. Thanks!

The ones in Homebrew::API::Cask and Tap are kind of unavoidable unless we want to change how we organize code a lot. This also mimics how the formula logic already works. The logic in the cask loader can maybe be improved though.

@@ -361,7 +362,6 @@ def to_h
# @private
def to_internal_api_hash
api_hash = {
"token" => token,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be included in the hash because it's already the key of each object. Not sure how I missed this before.

Comment on lines -49 to -51
after do
Homebrew::API::Formula.clear_cache
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already clear all caches after each test.

Comment on lines -300 to +301
# TODO: Clean this up when we deprecate the current JSON API and move to the internal JSON v3.
ruby_source_sha256 = json_cask.dig(:ruby_source_checksum, :sha256)
ruby_source_sha256 ||= json_cask[:ruby_source_sha256]
@ruby_source_checksum = { "sha256" => ruby_source_sha256 }
@ruby_source_checksum = json_cask[:ruby_source_checksum] || # public JSON v2
json_cask[:ruby_source_sha256]&.then { { "sha256" => _1 } } # internal JSON v3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing this TODO because we won't be able to remove this going forward. We need to keep backwards compatibility with all of the JSON cask files that get stored when casks get installed from the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep backwards compatibility with all of the JSON cask files that get stored when casks get installed from the API.

@apainintheneck Oof. That seems undesirable (but clearly not your fault). Could we add some sort of versioning to these files/filenames so this isn't going to have to be the case indefinitely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to keep backwards compatibility until all of the currently supported macOS versions reach end of life. It will be a while. I'm not sure how we handle very long term deprecations like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck That's not tenable given how recently this was introduced and that we have e.g. already two paths for this (writing formula files, writing JSON files). Instead, we should handle when this JSON file is in an invalid format and act the same as if it were missing entirely (which is something that already works).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that is another option. We don't even need to explicitly handle it being in an invalid format since as you already said we handle that already. I guess in that case it shouldn't be that big of a deal.

# load the same cask file that was used for installation, if possible
def load_installed_caskfile!
Migrator.migrate_if_needed(@cask)
installed_caskfile = @cask.installed_caskfile
if installed_caskfile&.exist?
begin
@cask = CaskLoader.load(installed_caskfile)
return
rescue CaskInvalidError
# could be caused by trying to load outdated caskfile
end
end
load_cask_from_source_api! if @cask.loaded_from_api? && @cask.caskfile_only?
# otherwise we default to the current cask
end

We already fall back to the current cask file whenever the old cask file can't be loaded. I still think backwards compatibility is cheap here though and I don't understand why we would want to break it. Breaking backwards compatibility here could potentially affect all casks installed in the last year from the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we definitely shouldn't do this.

What is "this", sorry?

It means that you'd only be able to load a cask for any of the commands that require that if it's currently available in a tap.

This seems reasonable but I don't care enough to argue about it any more if you feel strongly about this.

The alternative is: write a JSON file into the Caskroom in the same format whether installing from homebrew-cask, other official taps or a 3rd-party tap. Again: the status quo of "install caskfile sometimes and JSON file other times" is not a good one.

Is the compatibility code really that bad?

If it was all neatly sorted into methods or a class/classes named accordingly: no big deal. As-is: it's going to be hard to extract later.

The easiest way to avoid that branching would be to have more conservative changes to the JSON layout

Again: the more conservative the approaches to changing the JSON, the less it seems worthy of having an internal JSON and/or a v3 JSON at all.

To be clear: I think the v3 JSON is very desirable and the internal one is: we just need to do it right. This is an API format that may still be with us in 5-10 years.

I do like the idea of transforming the old JSON files into the expected format but at the same time that just seems like moving the compatibility code somewhere else in the codebase.

If it's in a single location that's easier to delete/remove/deprecate later: that's better than having it inline.

See the OS layer as an example: it makes it much easier than just having if OS.mac? scattered all over the place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we definitely shouldn't do this.

What is "this", sorry?

My bad. I was talking about the idea of having a reduced info cask JSON that could only be used to uninstall casks.

It means that you'd only be able to load a cask for any of the commands that require that if it's currently available in a tap.

This seems reasonable but I don't care enough to argue about it any more if you feel strongly about this.

The alternative is: write a JSON file into the Caskroom in the same format whether installing from homebrew-cask, other official taps or a 3rd-party tap. Again: the status quo of "install caskfile sometimes and JSON file other times" is not a good one.

The cask JSON doesn't support all of the features found in the cask Ruby file right now so I don't think we'll be able to do that unless we change how the Ruby file works.

Is the compatibility code really that bad?

If it was all neatly sorted into methods or a class/classes named accordingly: no big deal. As-is: it's going to be hard to extract later.

Fair point. 👍

The easiest way to avoid that branching would be to have more conservative changes to the JSON layout

Again: the more conservative the approaches to changing the JSON, the less it seems worthy of having an internal JSON and/or a v3 JSON at all.

To be clear: I think the v3 JSON is very desirable and the internal one is: we just need to do it right. This is an API format that may still be with us in 5-10 years.

It'll still be smaller and more efficient just not as pretty.

I do like the idea of transforming the old JSON files into the expected format but at the same time that just seems like moving the compatibility code somewhere else in the codebase.

If it's in a single location that's easier to delete/remove/deprecate later: that's better than having it inline.

See the OS layer as an example: it makes it much easier than just having if OS.mac? scattered all over the place.

That makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I was talking about the idea of having a reduced info cask JSON that could only be used to uninstall casks.

Gotcha. You've convinced me, thanks for taking the time to do so ❤️

The cask JSON doesn't support all of the features found in the cask Ruby file right now so I don't think we'll be able to do that unless we change how the Ruby file works.

What regressions have we caused to users through not writing cask files to the Caskroom? If there are (m)any: have there been (m)any user reports about this (given that most users have been using this for over a year)?

(implication being: few/zero user complaints about this means that this is not actually a meaningful problem for users worth blocking moving everyone to this path)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What regressions have we caused to users through not writing cask files to the Caskroom? If there are (m)any: have there been (m)any user reports about this (given that most users have been using this for over a year)?

(implication being: few/zero user complaints about this means that this is not actually a meaningful problem for users worth blocking moving everyone to this path)

This is related to the casks which are Cask::Cask#caskfile_only?. They either have languages or *flight blocks. It hasn't caused any problems because we currently sidestep the problem completely and just download the original cask file (the Ruby one) whenever we install them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, forgot about that.

We should probably figure out how to add the necessary data to JSON v3.

I know, scope creep, sorry.

Comment on lines -302 to +307
cask_options[:tap] = Tap.fetch(json_cask[:tap]) if json_cask[:tap].to_s.include?("/")
cask_options[:tap] = if json_cask.key?(:tap) # public JSON v2
tap_value = json_cask.fetch(:tap)
Tap.fetch(tap_value) if tap_value.to_s.include?("/")
else # internal JSON v3
CoreCaskTap.instance
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just me trying to avoid using Homebrew::API.internal_json_v3? since we'll want to remove that eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to add the tap string to the json_cask value so that the cask gets saved with more information on install. It might make sense to include renames too and there might be something else I'm missing. This a cask-specific problem since we always store installed formulae with their source files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either of them are really necessary for uninstalls though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to figure out what the bare minimum is that we actually need stored here for e.g. uninstalls and just store that, probably as a new dedicated JSON file that bears no resemblance to public or private APIs and never store the cask file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to work on and have to maintain another format. We should be good as long as tap is included so maybe we should just add it in the be sure about things. I'm going to think a bit more about how we want to handle loading legacy cask JSON that didn't have the tap defined.

My first thought is that if we're loading from a JSON blob or file we should just use the defined tap if it exists.When loading from the API it should probably just default to the core cask tap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to work on and have to maintain another format.

I don't think it makes sense to store a bunch of data here that's not actually used and reuse/tweak an existing format when we already have two different "formats" (cask file vs. JSON on disk).

Note that this has been a pretty long-running issue in Cask due to writing too much here and breaking backwards compatibility in the past causing Casks to not be able to be uninstalled.

What I'm arguing for is essentially:

  • reusing the internal JSON format for this is not a good idea and going to cause more issues in future (as it has in the past)
  • the existing format contains far too much unnecessary data which has and will continue to cause issues due to the too-large surface area here that's too easy for us to break with deprecation/disables/removals
  • having 2-3 paths for what files are stored in the Caskroom is bad when Formulae has only one
  • less of a big deal but: having how Formulae and Casks behave so differently in this way is undesirable

Genuinely I apologise that I keep stumbling on things that end up blowing up the scope here, it's just that some of this stuff has been very painful over long periods of time in Homebrew and I'd rather we did it properly if we're going to introduce a new API for this.

Genuinely appreciate all your work here and elsewhere: thank you @apainintheneck ❤️

My first thought is that if we're loading from a JSON blob or file we should just use the defined tap if it exists.When loading from the API it should probably just default to the core cask tap.

This seems reasonable.

I can't see when, with the existing code, how JSON would be used for non-core cask taps though, right? It'll always use the Cask file I'd think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what would the one file type be in the case that we decided to only support one format for installed casks? We could download the original cask file whenever we install a cask and just store that. Then, we wouldn't have to support the installed JSON file at all. Or we could generate a JSON file for every cask install and only store the JSON representation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what would the one file type be in the case that we decided to only support one format for installed casks?

A JSON file.

Or we could generate a JSON file for every cask install and only store the JSON representation.

A minimal JSON file that's a subset of internal JSON v3 containing only the information needed to e.g. uninstall a cask.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new notes but happy to merge whenever you are and can be addressed in follow-up PRs.

@apainintheneck
Copy link
Contributor Author

apainintheneck commented Apr 10, 2024

It looks like there is still broad disagreement about which types of cask files should get stored during installation, how to prioritize backwards compatibility with legacy cask file formats and which shape any new formats should have going forward. For that reason, I will close this PR and revert the changes made in previous PRs to generate cask JSON v3. At some future point, or as this discussion progresses it might be worth revisiting this idea but now is not that time.

@apainintheneck
Copy link
Contributor Author

Do we care at all about potentially having formula JSON v3 and cask JSON v2 at the same time? They don't technically rely on each other.

@MikeMcQuaid MikeMcQuaid reopened this Apr 10, 2024
@MikeMcQuaid
Copy link
Member

@apainintheneck Please don't close this, I can assign this to myself and finish it off. If you'd rather someone else opened this PR and it includes your commit(s): I can do that and use that as building blocks for my own PR(s).

@apainintheneck
Copy link
Contributor Author

Feel free to use this PR as a starting point. I don't currently see any path forward that doesn't require first merging in other changes to the codebase. By that time this PR will likely be more than a month out-of-date. There is also no agreement currently on what we want this to look like which is why it made sense for me to close this. I'd be interested to hear your point of view on what you think it will take to get this merged in.

@MikeMcQuaid MikeMcQuaid added the in progress Maintainers are working on this label Apr 11, 2024
@MikeMcQuaid
Copy link
Member

Feel free to use this PR as a starting point.

Thanks 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cask Homebrew Cask in progress Maintainers are working on this install from api Relates to API installs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants