Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support one-to-many transforms #3

Open
novemberborn opened this issue May 9, 2016 · 11 comments
Open

Support one-to-many transforms #3

novemberborn opened this issue May 9, 2016 · 11 comments

Comments

@novemberborn
Copy link
Contributor

Currently transforms are one-to-one. It'd be useful if the transform implementation could return multiple values. For instance with AVA we'll want to cache static analysis of the test files.

There should also be an interface for accessing cached values, given the cache directory and the hash. Again in AVA the transform implementation writes source maps to disk, coordinating their location with the test worker which needs to read them. This could be done through caching-transform itself.

(Source maps should still be written to disk in order to play nice with source-map-support, but the precompiler could do that without having to coordinate with the worker.)

My proposal is to allow the transform() implementation to return an object. Keys can be used to access particular results, values must be strings or buffers. When called for an input that has already been cached a similar object can be returned.

Currently caching-transform is a factory for a transform function. We could add a get(hash, key?) method. Without the key the default result (string / buffer / object) is returned. With the key a specific value is returned. If there is no value for the hash or key, null is returned.

We could write a JSON object to disk, though that'd be tricky with binary data. It also means that getting a particular value would require the entire cache entry to be read into memory. Alternatively we could write a simple packing implementation that allows random access. (Tar would be an obvious solution, but existing implementations seem to be asynchronous, and as far as I can tell tar requires you to seek for individual files.) A final option is to write a "header" and individual results as separate files.

@sindresorhus
Copy link
Member

Tar would be an obvious solution, but existing implementations seem to be asynchronous

Maybe asar? It's async, but I guess we could do a PR adding a sync interface.

@novemberborn
Copy link
Contributor Author

@sindresorhus asar looks good! Bit more advanced than our needs but pretty similar to what I had in mind.

@novemberborn
Copy link
Contributor Author

It'd be nice though if we could store text data in the archive so you can still hack it a little. But at that point perhaps storing multiple files on disk is a better approach.

@novemberborn
Copy link
Contributor Author

And of course if we use asar you could use that to unpack, edit, and repack…

@jamestalmage
Copy link
Member

I doubt asar will provide all the protections that graceful-fs does.

What if we just use file extensions, and enforce keys to be valid extensions:

node_modules/.cache
  key1.js
  key1.map
  key2.js
  key2.map
  key3.js
  key3.map

This is what we do in AVA now, but we could formalize it here.

It'd be nice though if we could store text data in the archive so you can still hack it a little.

Or just examine it. I actually do that pretty frequently when debugging AVA - just to make sure something hasn't gone haywire in a transform.

@novemberborn
Copy link
Contributor Author

I doubt asar will provide all the protections that graceful-fs does.

Protections?

What if we just use file extensions, and enforce keys to be valid extensions:

I'd like to remove the extension option. Then if you return a string/buffer, we write to ${hash}.default. Other files are written to ${hash}.${key}. To avoid globbing we only support accessing entries by key (and hash).

@jamestalmage
Copy link
Member

Protections?

graceful-fs avoids write collisions.

@jamestalmage
Copy link
Member

I like the plan, maybe change extension to defaultExtension? I like that it is easy to examine the file in the cache.

@novemberborn
Copy link
Contributor Author

graceful-fs avoids write collisions.

I assumed asar would create a buffer, which we could then write to disk.

I like the plan, maybe change extension to defaultExtension? I like that it is easy to examine the file in the cache.

I'm not sure why it should be configurable. If we're concerned about operating systems not knowing what to do with a .default extension we could use .txt.

@jamestalmage
Copy link
Member

I assumed asar would create a buffer, which we could then write to disk.

If that is the case, then I don't get the appeal. We want random access to individual components.

@novemberborn
Copy link
Contributor Author

We want random access to individual components.

Sure, but you mentioned graceful-fs in the context of write collisions. Though I suppose it'll help with reads as well.

Anyway asar doesn't currently have a synchronous implementation so it'll be easier for us to lay out multiple files on disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants