Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a language option when updating the cache #335

Open
9mdv opened this issue Jul 16, 2023 · 13 comments · May be fixed by #345
Open

Add a language option when updating the cache #335

9mdv opened this issue Jul 16, 2023 · 13 comments · May be fixed by #345

Comments

@9mdv
Copy link

9mdv commented Jul 16, 2023

Updating the cache downloads pages in all locales which is not needed. Being able to specify a language would be ideal, like this:

tldr --update en

So we would only need to fetch 15 MB (for english) instead of all pages which is over 50 MB, saving disk space as well as bandwidth for those with metered connections. This would also go well as a config option as discussed in #251.

2023-07-14_103116

@dbrgn
Copy link
Owner

dbrgn commented Jul 20, 2023

We currently download the archive from tldr.sh (https://tldr.sh/assets/tldr.zip) which is generated through GitHub AFAIK. This archive contains all languages and cannot be filtered.

As long as the TLDR pages project doesn't provide separate URLs/archives per language for downloading, I doubt that this can be implemented without a bigger effort. So I'll close this feature request for now as "not planned". If you think this is worth pursueing, you could open an issue at the TLDR project itself, and ask them to provide automatically generated download paths per language.

@adamazing
Copy link
Contributor

adamazing commented Jul 28, 2023

FYI Have started a discussion upstream about this issue. I've created a preliminary upstream PR that would enable this but it's rebuilding all zips on every push. I'll make a subsequent PR to zip only what's necessary. Not sure if the former will be merged or closed in deference to the latter.

@dbrgn

@kbdharun
Copy link
Contributor

kbdharun commented Aug 16, 2023

I have merged the PR upstream and the language assets are available at https://github.com/tldr-pages/tldr-pages.github.io/tree/main/assets; so IG this issue can be reopened, I will update the client specification to highlight this addition in tldr-pages/tldr#10148.

@niklasmohrin niklasmohrin reopened this Aug 16, 2023
@adamazing
Copy link
Contributor

adamazing commented Aug 16, 2023

I have merged the PR upstream and the language assets are available at https://github.com/tldr-pages/tldr-pages.github.io/tree/main/assets; so IG this issue can be reopened, I will update the client specification to highlight this addition in tldr-pages/tldr#10148.

Thanks @kbdharun!

@dbrgn, I'll focus on the second-phase PR upstream in tldr-pages (to avoid building/rebuilding assets unless necessary) but I'm happy to look at adding the language option in tealdeer once that's done? (But don't want to block anyone else working on it 😅)

@niklasmohrin
Copy link
Collaborator

niklasmohrin commented Aug 16, 2023

Sounds good, it isn't everyday that issues here actually make it into the spec so quickly (awesome work!). I am happy to review and accept a PR for this (and I would be surprised if @dbrgn disagrees)

@niklasmohrin
Copy link
Collaborator

Some thoughts for the implementation: I think we should use the new archives to replicate the directory structure we currently have to maintain backwards compatibility. We should then only download the languages as they would be used for querying, so according to get_languages_from_env which would already make tldr -u -L de work (I don't think tldr --update de is feasible, because it already has a meaning, which is "run tldr, update before querying, and then query for de").

One question that remains is how we want to handle "additional languages". For example, a user usually wants to query English pages, but every now and then, they look up a German one. Their config will probably have language = "en" (or nothing at all) and they use --language every time they want to see the German page. With the method outlined above, this wouldn't work, as only the English pages would be in the cache. Ultimately, I think we should add another config variable extra_languages or so that will be taken into account when updating the cache. However, we can come back to this new config option in a separate issue :^)

@m040601
Copy link

m040601 commented Sep 1, 2023

Thanks for your efforts polishing this tool. My favorite tldr client.

Another big vote for this feature request about configuring the preferred language.

So annoying to have all not needed languages installed.

I understand this is not tealdeer's "fault". Hope this does end up being implemented. Also read, #251

As long as the TLDR pages project doesn't provide separate URLs/archives per language for downloading, I doubt that this can be implemented without a bigger effort.

I also ended up here after inspecting the config.toml and looking for an answer not in the man page or README.

So if I may suggest, in the meantime, and until it is implemented. Could you add a note about it to the README or man page ? One short phrase or paraggraph will do.

Thanks in advance.

@nkh
Copy link

nkh commented Sep 3, 2023

My 2 cents.

Although it would certainly make a difference if translations from English are not downloaded, the problem with having to download everything English again and again would persist, but maybe the size is so reduced that updates are fast anyway (I mean sub second)

A git shallow clone permits very fast upgrades, sub second.

The zip file is 5.5 MB, the full repo is 20MB so I guess the shallow repo would be significantly less and compressed during download anyway.

Once the shallow is present translation directories can be removed.

When it's time to update the translation directories will be reloaded, so maybe it's better to keep them around, git won't complain about it, just fetch them. I'd rather lose a few MB and have an update that takes 20 seconds less, for those who prefer to save space the translations could be deleted automatically again.

It's a bit counter intuitive to download files that are going to be deleted but so is downloading the same files again for just a few modifications.

But let's not stop there and go to patches, thank you Mr Wall and al, it's possible to download only patches from github. I don't think it's possible to simply get a list of commits from github, not even via their cli tool, but it's cheaper to download a commit list and a few patches than a large zip file.

@niklasmohrin
Copy link
Collaborator

@m040601 I am not sure I completely understand what exactly it is we should mention in the readme, can you clarify what you meant? I wouldn't mention that all languages are downloaded as we are about to change that anyways

@niklasmohrin
Copy link
Collaborator

@nkh No, I don't think we should go down this road. There are multiple reasons for downloading the archive instead of using git:

  1. The client specification (https://github.com/tldr-pages/tldr/blob/main/CLIENT-SPECIFICATION.MD) says to do so (this reason alone suffices)
  2. No dependency on git
  3. Very straightforward process and code
  4. Allows using your own server for hosting the pages (not currently implemented)

Also note that shallow clones are usually not intended to be updated, if I recall correctly.

Deleting the translation directories in the git tree does not save space because they are still present in the git folder (note how you can restore deleted files if they have been previously committed).

So bottom line, I veto the git approach. We stick to the plan of downloading the language specific archives

@nkh
Copy link

nkh commented Sep 3, 2023

I understand not wanting to be dependent on git, even if it's ubiquitous.

Would it be possible to get an option to get tealdeer to recompute its cache without updating the pages via a download?

@niklasmohrin
Copy link
Collaborator

niklasmohrin commented Sep 3, 2023

@nkh Downloading the archive and replacing the old files is exactly what "updating the cache" is, there is no additional computation after that:

tealdeer/src/cache.rs

Lines 163 to 189 in 9e1489b

/// Update the pages cache from the specified URL.
pub fn update(&self, archive_url: &str) -> Result<()> {
self.ensure_cache_dir_exists()?;
// First, download the compressed data
let bytes: Vec<u8> = Self::download(archive_url)?;
// Decompress the response body into an `Archive`
let mut archive = ZipArchive::new(Cursor::new(bytes))
.context("Could not decompress downloaded ZIP archive")?;
// Clear cache directory
// Note: This is not the best solution. Ideally we would download the
// archive to a temporary directory and then swap the two directories.
// But renaming a directory doesn't work across filesystems and Rust
// does not yet offer a recursive directory copying function. So for
// now, we'll use this approach.
self.clear()
.context("Could not clear the cache directory")?;
// Extract archive into pages dir
archive
.extract(&self.pages_dir())
.context("Could not unpack compressed data")?;
Ok(())
}

You could in theory have a git repository where tealdeer expects the cache files to be and update that yourself, but we do not make any guarantee that the expected directory structure of the cache will not change with a future release, so the setup might break at any time. If you just don't want the message reminding you to update, you could use the --quiet flag.

@nkh
Copy link

nkh commented Sep 4, 2023

@niklasmohrin that's exactly what I want to do, I though tealdeer generated some king of cache for faster access after downloading the archive.

Thank you for the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants