Rethink precaching's activate behavior #1793

jeffposnick · 2018-12-14T18:50:53Z

Library Affected:
workbox-precaching

Background

If service worker cache update fails halfway through, app is bricked #1316 is the original issue that led to Workbox v3's current behavior: during install, Responses corresponding to the set of new assets listed in the precache manifest are written to a "temporary" cache, and during activate, they're copied over to the "main" cache. workbox-precaching will never serve Resposnes directly from the temporary cache. Unless the activate handler completes successfully, assets will end up "stuck" in the temporary cache indefinitely.
Clarification on what happens during a terminated activation w3c/ServiceWorker#1372 is a discussion about how various browser's deal with errors that prevent activate handlers from completing. As per the service worker spec, an activate handler is effectively not allowed to fail, but bugs and edge cases can lead to them failing in practice. The general takeaway is that Edge and Chrome do not make any effort to retry the activate handler when an error leads to failure (with some provisions in place to delay activation when it's likely to fail, e.g. when the browser is shutting down).
Preserve the temporary cache #1791 is a PR to deal with a particular issue observed (presumably due to a browser bug) in Firefox, where the activate handler doesn't end up completing, leading to Firefox re-triggering the install event. Firefox's current behavior goes against the service worker specification, but at the same time, we've now seen it in the wild, so a fix seemed prudent.

Proposed changes

workbox-precaching's current reliance on the activate event running to completion can lead to cache inconsistencies in any browser that follows the service worker spec if the activate event terminates prior to completion. This is apparently easy-ish(?) to trigger in Edge if you shut down the browser while a service worker activation is happening.

As far as I can tell, the best way to deal with this is to ensure that the only things that happen during the activate event is purging unneeded cache entries—the sort of thing that we want to do, but which won't lead to cache inconsistencies if it didn't complete.

I can think of a couple of ways to move to that model:

sw-precache's approach

The earlier sw-precache library handled this scenario by using a single cache whose entries including a synthetic _sw-precache=<revision> query parameter appended to the end of the URL. (URLs that were known to already include revision information did not have this query parameter appended.) These cache entries could safely be written to the single cache during the install event without any risk of overwriting entries still in use by the previously activated service worker, since updated entries would have updated URLs.

The activate handler, in that approach, just needs to do cleanup.

One tangential benefit of this approach is that it's not necessary to use IndexedDB to maintain out-of-band metadata keeping track of the active mapping of URLs to revision info, which we have to do in Workbox. Adopting this approach would mean eliminating the dependencies on the IndexedDB runtime code from the workbox-precaching bundle. The mapping of "real" URL to the correct URL to use as a cache key can be maintained via a variable in the service worker's scope.

The primary downside of this approach is that developers who expect their precached resources to use specific URLs as their cache keys will end up confused; the cached entry for /index.html, for instance, might have a cache key of /index.html?_workbox=abcd1234. Attempts to read the cache via, e.g., caches.match('/index.html') will fail.

To work around this, developers would have to do one of two things, each with drawbacks:

Rely on a function we'd expose in workbox-precaching that would map a "real" URL into a cache key (c.f. this in sw-precache). It would be hard to get this info from inside the window context, since it's relies on data that's only available inside the service worker's scope.
Use caches.match('/index.html', {ignoreSearch: true}), with the risk that the incorrect entry is returned if multiple URLs match the same path and only vary based on query parameters.

Double-cache everything

A less radical departure from workbox-precaching's current behavior would be to create a new logical cache each time the install handler is run, presumably using some information from inside the precache manifest to include a unique versioned string in the name. The new cache would contain a complete copy of every resource listed in the current precache manifest, even if that resource has already been previously precached.

In this scenario, the activate handler would just call caches.delete() on any out-of-date logical caches.

This approach would not lead to extra data being transferred over the network, since we can use cache.match() + cache.put() to make copies of existing entries locally.

The major downside of this approach is that we'd risk triggering quota errors by maintaining two full copies of all precached resources during the interval between install and cleanup in activate. It's not clear how we could automatically recover from quota errors encountered during the install handler, and it's easy to envision failure scenarios in which install is re-attempted repeatedly, failing each time due to quota errors, leaving users stuck with a previous set of assets indefinitely.

(This is actually an existing failure scenario that applies today, and our current approach to avoiding it is just to try not to duplicate anything when we can help it; see #1312 for better ideas.)

A tangential downside is that we would have to continue using IndexedDB inside of workbox-precaching, leading to a larger runtime bundle and greater code complexity than we'd otherwise have.

Other approaches?

I'm not aware of any, but I'm interested in hearing other ideas.

The text was updated successfully, but these errors were encountered:

philipwalton · 2018-12-14T20:24:15Z

I'd be OK with adding metadata to the URL as long as users have a way to get access to the modified URL in the event they want to write their own caching logic with resources in the precache cache.

I also think we should support a mode that allows users to opt out of this behavior in the event they're already fingerprinting their assets. In fact, since finger printing assets is a best practice, I'd argue we should optimize the precaching module for users who are doing that, while still supporting users who aren't (with your suggestion of the sw-precache approach).

In fact, part of my hesitation around using the precaching module in the past is feeling like it does a bunch of stuff I don't need it to do (e.g. the temp cache), so if we could remove the temp cache and the URL modification for users who are following our caching best practices, that would be my preference.

jeffposnick · 2018-12-14T20:40:07Z

(URLs that were known to already include revision information did not have this query parameter appended.)

This was done in sw-precache by checking dontCacheBustUrlsMatching at runtime, but it's even more straightforward with Workbox's precache manifest format, since there's either a revision field or there isn't one. When there isn't a revision field, we'd use the URL as-is as the cache key.

Of course, precaching HTML, if possible, is a good practice, and URLs for HTML documents won't contain in-band versioning hashes. So most developers would end up with a synthetic query parameter appended to the real URL as the cache key for HTML resources.

philipwalton · 2018-12-14T20:46:42Z

When there isn't a revision field, we'd use the URL as-is as the cache key.

👍

Of course, precaching HTML, if possible, is a good practice, and URLs for HTML documents won't contain in-band versioning hashes.

Ahh, right (not doing this myself I sometimes forget). So in that case my not-yet-formed ideas of how we could separate the logic entirely (so those using fingerprinting don't need to include it) is probably not worth exploring.

jeffposnick · 2019-01-02T21:14:30Z

(This should also help with #1218, as removing IndexedDB code from workbox-precaching appears to shave around 1.7kb from the size of workbox-precaching.prod.js)

rodrigoalmeidaee · 2019-02-07T19:06:08Z

Hi @jeffposnick,

We ran into #1316 ourselves because we ship our app mainly to schools, in which network connectivity isn't always ideal, routers get overloaded, etc. So, to our surprise, although none of the internal tests revealed the issue, every time we pushed a new update we would get a couple reports of users sending screenshots of a bricked app (the CSS was all wrong, result of using an old javascript app with a new css file, or vice versa)

From my understanding, here is the chain of events that follow. For simplification, I'll assume that our app uses only two files: main.js and main.css.

If a first installation attempt fails half way through (let's say main.js is successfully downloaded and main.css isn't), the IndexedDB entries (precacheDetailsModel) will be updated to reflect that (so main.js's revision will be updated in the db) and main.js will be added to the temp AppCache. Main.css won't trigger any changes in either AppCache or the indexedDB entries.

When the app is reloaded, everything still works, as all requests are served from the AppCache which was kept intact so far. However, this will automatically trigger a second installation attempt in which:

The temp cache is erased. So the previously downloaded main.js is lost
Only main.css will be downloaded, as the IndexedDB will say that (supposedly) main.js is already at the target revision
When the activation completes, the temp AppCache will contain the new main.css but not the new main.js. This results in the app serving the new CSS with the old javascript.

Given the number of reports we would get of users sending screenshots of bricked CSS apps and the time sensitivity of the issue, we simply forked workbox and applied this fix, which is to basically only update indexedDB if all network requests succeed:

geekie@caffbf8

(I think it is one of the possibilities you mentioned above or in #1316). It seems like a proper fix assuming nothing else could go wrong after indexedDB is updated. I suspect this may not be the case: If that's true then this fix makes workbox work in more scenarios, but doesn't provide a 100% guarantee that it will always properly install updates.

Given that we want ultimately use a non-forked version of workbox, we'd be happy to assist with developing PRs that help address this issue in a way that makes sense from your point of view. One possibility I thought of was to write entries to the IndexedDB (precacheDetailsModel), during the install phase, with an additional boolean flag that says they were not yet "activated" (a successful activation would then reset this flag). Any further installation attempt would then ignore any entries that have this flag set. The risk here is that if the activation phase doesn't correctly reset these flags, further updates will download a lot more than needed.

Let me know if there is any way we can help!

Thanks!
Rodrigo.

jeffposnick · 2019-02-19T21:27:46Z

We've moved away from the IndexedDB model and -temp cache in Workbox v4, and gone with what's described as "sw-precache's approach".

This was referenced Dec 19, 2018

Clarification on what happens during a terminated activation w3c/ServiceWorker#1372

Open

Preserve the temporary cache #1791

Closed

Precached content filesizes are duplicated on page reload #1799

Closed

jeffposnick mentioned this issue Jan 8, 2019

workbox-precaching rewrite to avoid dependency on activate handler #1820

Merged

jeffposnick closed this as completed Feb 19, 2019

jeffposnick mentioned this issue Apr 10, 2019

For matching navigation route consider passing {ignoreSearch} #2024

Closed

jeffposnick mentioned this issue May 16, 2019

Unexpected token < error after redeploy (InjectManifest Mode) #2053

Closed

jeffposnick mentioned this issue Mar 5, 2021

Update service worker to perform caching activities in install event flutter/flutter#66067

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink precaching's activate behavior #1793

Rethink precaching's activate behavior #1793

jeffposnick commented Dec 14, 2018

philipwalton commented Dec 14, 2018

jeffposnick commented Dec 14, 2018

philipwalton commented Dec 14, 2018

jeffposnick commented Jan 2, 2019

rodrigoalmeidaee commented Feb 7, 2019

jeffposnick commented Feb 19, 2019

Rethink precaching's activate behavior #1793

Rethink precaching's activate behavior #1793

Comments

jeffposnick commented Dec 14, 2018

Background

Proposed changes

sw-precache's approach

Double-cache everything

Other approaches?

philipwalton commented Dec 14, 2018

jeffposnick commented Dec 14, 2018

philipwalton commented Dec 14, 2018

jeffposnick commented Jan 2, 2019

rodrigoalmeidaee commented Feb 7, 2019

jeffposnick commented Feb 19, 2019