Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging Data for Busytex #2

Open
Thecoolpeople opened this issue Nov 6, 2023 · 7 comments
Open

Packaging Data for Busytex #2

Thecoolpeople opened this issue Nov 6, 2023 · 7 comments

Comments

@Thecoolpeople
Copy link

Wouldn't it better (performance and network traffic) if busytex don't download the full texlive-basic (uncompressed by the way), instead download only the packages needed.

I tool a look into the archive folder, and the best way would be, to create .7z files (there are very good 7zip readers in js - or use an other compression format like tar.xz) and make a graphics.7z, and so on.

The base latex package (latex.r******.tar.xz and latex.source.r******.tar.xz) are 888kB in common. (there is a tlpkg folder in them, which can be ignored). All packages are build like that. If you want to use 7z it would be 763kb (I attached this file
latex.txt -> change file ending to .7z and open it. everything is in there
)

For using cdn this trick can be used, to make js or other static cacheable files from that.

So the packageLoader must only load the needed packageFiles and the internal archive file system can be used as well, so there must not be any reinvention of the wheel.

Another good advantage would be that there must not be an "tlgmr" install. CTAN packages are bigger (~50mb for above latex package, because the docu pdf files are included)

What do you think about this version of a packager? -> it would extremly reduce the initial load.

Have a nice evening.
Thecoolpeople

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Nov 7, 2023

Hi! Thanks for interest in my hobby project :)

Some my thoughts on data packaging:

  • I do not have huge experience in TeX packaging / TDS and making it lightweight. E.g. I could not manage so far to have all the fonts working even in texlive-basic: https://tex.stackexchange.com/questions/671169/could-not-locate-a-virtual-physical-font-ecrm1095-during-xdvipdfmx-run-texlive . I guess for some simple, restricted LaTeX compilation, such distro might not be required, so of course if you know how to prepare a small package (along with fonts.conf and texmf.cnf) - it would be awesome and I'll of course accept PRs with scripts generating such packages. I'd also be happy to chat if you know how to make better TDS (e.g. I'm already deleting docs from a TDS, but I'm clearly not doing it good with fonts)
  • currently in the wasm setup, busytex relies on emscripten's data_packager.py setup - it prepares virtual files for a virtual FS in JavaScript, and the contents of the files in compressed with lzma. this is not very ideal because data_packager.py has its own problems: [feature request] Modernize file_package.py: chunking; promise-based loading and file creation; caching/decompression -> library-js; JSON-based index emscripten-core/emscripten#14385
  • it would be good to have a simple read-only virtual FS impl only in C, redirecting open/fopen calls to read from a binary package (e.g. using fmemopen). then we can use the same FS / data packages for a wasm and x64 version of busytex
  • another direction is setting up remote files (a feature of virtual FS in emscripten) available from http only. this is also possible and I tried it, but I haven't made it into a working recipe so far. One problem in using the *.7z packages directly is that we might need to pre-index them to create virtual files

To conclude, if you'd like to contribute to the project, please let me know and we can chat more.

@Thecoolpeople
Copy link
Author

Thecoolpeople commented Nov 8, 2023

My idea (for the first step) is, that a script detect which packages are needed (analyse the tex files). Then pull the package.7z files from the server (if not already). Then load all content from inside the archive into the virtual FileSystem.

Of course, if that works, the best thing would be to read the contents of the files directly from the archives, but I think for the first step, this is not needed at the moment.

So, for the first step, there won't be any "pre-index" necessary. I am currently working on the packer, and then I will made a pull request, so you can review it. Then, I think, we both can be work on the loader and tex packageChecking.

I know the first Version is only available for the wasm version.

(my packager can export the latexPackage as 7z or tar.xz - so we can use the best one, for 7z you must have installed py7zr from pip)

as you can see, there is "no" difference in the file size
image

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Nov 8, 2023

I think your approach might work if your tex sources don't need to do any font compilation / generation / other post processing at dependent package installation step

@Thecoolpeople
Copy link
Author

But, when i use the default installer (while the pipeline builds it) and install one package after another and pack the "installed" files in the directory. After every single package I delete all temp files.
This is then the exact way as now, but only on single package side. (And smaller output packages)

Then all fonts,... are compiled and build correctly.

@vadimkantorov
Copy link
Contributor

I think I'm still not understanding your complete usecase :( Please explain your full usecase with an example

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Nov 15, 2023

Will the fonts for your usecase get compiled at the latex compilation time? If so, how? tlmgr includes a perl binary which is needed for running updmap/fmtutil and other scripts, but busytex currently does not include a perl.

I don't mind upstreaming a simpler packer script with an example even for a narrow usecase, but just the complete usecase needs to be explained (with usage examples), along with explained limitations

@vadimkantorov
Copy link
Contributor

(I've fixed the master branch and cleaned it up)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants