-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are the various "enum"s encoded? #36
Comments
We have several binary encodings and we're still experimenting and tweaking them to improve compression and parse speed. The one with which we've been measuring parse speed does encode them as strings, which are themselves encoded as indices in the string table, as LEB32 integers. The tokenizer itself is optimized to perform LEB32 lookups instead of string lookups, so that's still quite fast and reasonably easy to compress. We're also experimenting with encoding them as special interfaces, in a variant of the format which uses predictions on interfaces to improve compression, and this seems to observably decrease the size of the file. We haven't checked the impact on decompression speed. |
@Yoric LEB32? (You mean 32-bit little-endian integers?) |
Indeed, that's what I meant. |
They're listed as strings in the spec, but it would seem highly inefficient to encode them that way. Are they in fact encoded as strings? (If not, you could encode them as LEB128 integers.)
The text was updated successfully, but these errors were encountered: