Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Baking translations into bundle (zero runtime overhead) #1384

Open
thekip opened this issue Jan 30, 2023 · 14 comments
Open

RFC: Baking translations into bundle (zero runtime overhead) #1384

thekip opened this issue Jan 30, 2023 · 14 comments

Comments

@thekip
Copy link
Collaborator

thekip commented Jan 30, 2023

This is feature has it's own pros and cons, but definitely some would find it very useful.

In short, this feature would allow users to completely get rid from message catalogs at runtime. Instead, messages would be "baked" into bundle bringing zero runtime overhead.

Let's imagine we have an application which is bundled into ./dst

So structure would look like:

dst
  - main.js
  - chunk1.js
  - chunk2.js

With this future, user would have bundles for each language:

dst
  en/
     - main.js
     - chunk1.js
     - chunk2.js
  pl/
     - main.js
     - chunk1.js
     - chunk2.js

Imagine we have a source code:

alert(t`Hello World!`)

Each bundle would be postprocessed and receive a translation for each message, so

// dst/en
alert("Hello World!")

// dst/pl
alert("Cześć")

Then you can serve your application from /en or /pl subfolder, or you may have different setup deepening on your needs.

This approach is very optimized in terms of code splitting. You will not load messages for the code which is not loaded to client right now.

Imagine that you have a big app with a catalog of 2k+ messages, and you use code splitting so your users load on the index page only 3% of your application. But they still need to load full catalog because there are no way to codesplitting for catalogs.

The drawback of this approach - you need to full reload the page to change the language. But usually it's ok if you a not to concerned about it. User usually switches language only one time in the beginning of the interaction session.

Another drawback of this approach is more complicated setup process. You need to setup babel/swc plugins, add steps to your build process, etc

Implementation Details

The implementation would be split into 2 parts:

  1. Babel/Swc Macro
  2. CLI Postprocessor which will bake catalogs into the bundle.

Flow:

  • When user enable a special flag, macro starts working in different mode. Instead of regular ID, it produces a special placeholder in place of id. The placeholder has fixed length and starts and ends with special symbols. The purpose of this placeholder I explain later on.
  • User build application for production as usual with his bundler of a choice and also safely minimize the code.
  • Then user launches Lingui CLI postprocesor for /dst folder where bundler left it's output.
  • CLI parses the minified/optimized bundles and replace placeholder's to theirs actual values.

Why not just bake it with macro on the first step?

Because this way, users would have to run his bundler so many times as many languages he has. Bundling is costly,
and running bundler a few times for all of your languages will increase total build time x{lang count}.
Unlike bundling, baking strings in the bundle is a quite fast operation. We even don't need to parse a code into AST, we need just find substrings with fixed length using a simple regex:

const ID_LENGTH = 40;
const FLAGS_LENGTH = 2;

const PAYLOAD_LENGTH = ID_LENGTH + FLAGS_LENGTH;
const regexp = new RegExp(`I18N_START_.{${PAYLOAD_LENGTH}}_I18N_END`);

Another considerations

  1. To make DX better, macro should work in a slightly different manner In development mode. It's either could bake translations directly on preferred language, or fallback to runtime mode. This is TBD.
  2. It would be very helpful to have a special flag injected into code which could be used for conditional code, for example:
    if (i18n.mode === 'runtime') {
      await loadCatalog(lang)
    }
    When macro in 'compiled' mode it replaces i18n.mode === 'runtime' statement to the false, and next dead-code elimination remove this from the bundle.
  3. Current bundle locale and plural rules could be "baked" to the bundle the same way as messages.

Useful links:

@paales
Copy link

paales commented Feb 3, 2023

I was thinking about a different approach:

We usually don't have more than 6 languages in the project we build. How about a simple map that gets inlined for each translation? I think this would be a nice stopgap solution. Would be better than loading a complete nl catalog and still requires only a single build.

{ en: "translated string", nl: "vertaalde string" }[i18n.locale]

@thekip
Copy link
Collaborator Author

thekip commented Feb 3, 2023

How it's different from just loading translations in runtime? You still load the whole catalog, and it's usually a problem. I have code-splitting by page + by feature. Most of the users don't go deeper then first two pages, but the have to load strings for the whole app.

In other libraries which don't use extractors, that could be solved with "namespaces" and a lot of manual work. With extractor the only one way to make it code-spliting friendly without shooting in the leg it's baking it into bundle.

BTW the solution described in RFC we used in our production app for years, It worked really well. Then we replaced it with standardized $localize solution from angular when it reached more mature state (it was used on angular project).

I still have this code and going to implement some of the ideas from it in LinguiJs (no licenses issues, i'm a copyright owner of the code)

@thekip thekip mentioned this issue Feb 6, 2023
8 tasks
@paales
Copy link

paales commented Feb 6, 2023

How it's different from just loading translations in runtime?

If the translations are inlined for each call it would only inline the translations used on that page.

So, for the examplle:

import { Trans } from '@lingui/macro'
<Trans>Hello {name}</Trans>

It currently compiles to:

import { Trans } from '@lingui/react'
<Trans id="Hello {name}" values={{ name }} />

For this to actually work a general catalog needs to be loaded for the locale, right?

However, if we'd compile it to something like:

import { Trans } from '@lingui/react'
<Trans id="Hello {name}" values={{ name }} translations={{
  en: "Hello {name}",
  nl: "Hallo {name}",
  fr: "Bonjour {name}"
}}/>

This would inline the translations for all locales, but this would impact the bundle size much less than loading a complete catalog and have 97% of translations be unused. This solution becomes less effective when there are a tons of locales but I think it might be a simple solution when there are a few locales.

Your solution to create a bundle per locale is even more ideal, but that isn't a option for us, as next.js doesn't support that.

@thekip
Copy link
Collaborator Author

thekip commented Feb 6, 2023

You're proposing copy and bake in the bundle strings for every language. This will explode bundle size a lot, and this is not a direction we would like to go.

Your solution to create a bundle per locale is even more ideal, but that isn't a option for us, as next.js doesn't support that.

Maybe it's time to move out from this super opinionated, hard-to-do-easy-things framework? (don't get me seriously, I'm joking)

I think for the nextjs runtime translations is still goto option. The setup would too complicated to support SSG, vercel deployments, etc

@Bertg
Copy link
Contributor

Bertg commented Feb 6, 2023

You're proposing copy and bake in the bundle strings for every language. This will explode bundle size a lot, and this is not a direction we would like to go.

The bundle file from the perspective of the developer will indeed increase, like a-lot. Basically bundledize * count_of_locales. But from the user perspective - which is the one that matters - it will be unchanged, possibly even reduced. There is also a performance benefit, although I think that would be marginal in the larger scope of things.

But can't we have our cake and eat it too? The basic implementation could aim to be "0 runtime" with all the language variations built into it. Then an opt-in step (maybe separate package) could work to split these files into individual per-locale bundles?

@JSteunou
Copy link
Contributor

JSteunou commented Feb 6, 2023

I'm not comfortable with the 0 runtime thing if that mean you have to reload users page. Sure this is not every day you change the app language, but sometimes, for some cases (missing translation for example) it is nice to be able to change on the fly, without losing half data you were writing down for example.

An i18n library with 90% of the features I wanted but without the ability to serve chunks for messages, to load it at runtime, just to save some bytes and highlight a 0 runtime hero title would make me uncomfortable.

@Bertg
Copy link
Contributor

Bertg commented Feb 6, 2023

From our data, which admittedly is not very exhaustive, user almost never change their locale. This in part because we use the locale of the computer it is running on.

but sometimes, for some cases (missing translation for example) it is nice to be able to change on the fly

That makes som much assumptions. First, it assumes you don't have complete transactions, which a lib like Lingui should greatly help with. Second, it assumes that the application is built in such a way that component state would be maintained when the locale changes. That's a very big assumption... and - if that was really a big issue - there wouldn't be easier alternatives (eg, holding on to input state with local storage or what not).

With Lingui today we are pretty much forced to serve a single huge transaction file, of which - in a lot of cases - only a few string will be used. When baking transitions into the component, this never becomes an issue, you just load the translations you need, when you need them.

With a well set up build only components with changed translations would ever be "changed" and thus even increase the chances of a cache hit on the client side. Further reducing the load.

My final argument (for now, I'm sorry, I'm quite passionate about the topic): Not everyone has high speed, high bandwidth connections. So those few bytes, those few request do matter.

@paales
Copy link

paales commented Feb 6, 2023

You're proposing copy and bake in the bundle strings for every language. This will explode bundle size a lot, and this is not a direction we would like to go.

I don't think so: We would bake in the translations for every language for every used translation string. If 3% of the strings are used, this means that you have 3%*6languages = 18% of the the translations are used. And the other advantage is that it doesn't need any global provider to function, which is kinda beautiful (your solution doesn't either) :)

@thekip
Copy link
Collaborator Author

thekip commented Feb 7, 2023

i'm not comfortable with the 0 runtime thing if that mean you have to reload users page.

@JSteunou don't worry, this feature would be always optional. You can opt-in / opt-out in any moment.

@JSteunou
Copy link
Contributor

JSteunou commented Feb 7, 2023

From our data, which admittedly is not very exhaustive, user almost never change their locale. This in part because we use the locale of the computer it is running on.

but sometimes, for some cases (missing translation for example) it is nice to be able to change on the fly

That makes som much assumptions. First, it assumes you don't have complete transactions, which a lib like Lingui should greatly help with.

Yup but it happens, our app is the perfect example. Feature delivery goes faster than translation.

Second, it assumes that the application is built in such a way that component state would be maintained when the locale changes. That's a very big assumption... and - if that was really a big issue - there wouldn't be easier alternatives (eg, holding on to input state with local storage or what not).

This is clearly a big if from my example, I should have pick a better one :D

@thekip
Copy link
Collaborator Author

thekip commented Feb 7, 2023

Yup but it happens, our app is the perfect example. Feature delivery goes faster than translation.

This is fine. It also sometimes happened to us. BUT this is resolved while compilation catalogs. It fallbacks to the default language (en). Runtime is not involved in that substitution at all. Depending on what you want you can compile catalogs with or without strict mode, it would or would not return error respectively.

@JSteunou
Copy link
Contributor

JSteunou commented Feb 7, 2023

Yup but it happens, our app is the perfect example. Feature delivery goes faster than translation.

This is fine. It also sometimes happened to us. BUT this is resolved while compilation catalogs. It fallbacks to the default language (en). Runtime is not involved in that substitution at all. Depending on what you want you can compile catalogs with or without strict mode, it would or would not return error respectively.

Sure, my point was just that sometimes the fallback language or the current translation might no be good enough for the user and he would prefer to change the app language to another. Again, weird case, I know, it is not for everybody, but it happens.

@petercpwong
Copy link

This is personally one of my most sought after features in an i18n library and something that would immensely benefit both developer and the end user.

I recall someone else also tried to tackle a similar problem with a babel plugin for FormatJS but it never gained much traction:
https://github.com/hjylewis/babel-plugin-inline-i18n-messages

@thekip
Copy link
Collaborator Author

thekip commented Feb 23, 2023

Guys who iare using nextjs or similar framework, take a look at this proposal #1458

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants