Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zip::Writer Arithmetic overflow when adding 1 million files #14534

Open
sqonk opened this issue Apr 24, 2024 · 4 comments
Open

Zip::Writer Arithmetic overflow when adding 1 million files #14534

sqonk opened this issue Apr 24, 2024 · 4 comments

Comments

@sqonk
Copy link

sqonk commented Apr 24, 2024

Bug Report

While I was able to find existing issues for "Arithmetic overflow", I could not find any specific to the Zip Writer.

This problem was encountered while deploying a solution for compressing offsite backups for client data on a server.

Sample code to illustrate crash:

require "compress/zip"

dest = "out.zip"
ii = 0
begin
  File.open(dest, "w") do |zipFile|
    Compress::Zip::Writer.open(zipFile) do |zip|
      1000000.times do |i|
        ii = i

        max = 5 * 1024
        totalBytes = Random.rand(100..max)
        zip.add "file#{i}", Random.new.random_bytes(totalBytes)
      end
    end
  end
rescue e
  puts "## Failed adding file to zip on loop #{ii}: #{e.message}"
  puts e.inspect_with_backtrace
end
puts "done"

Result:

Failed adding file to zip on loop 999999: Arithmetic overflow

Arithmetic overflow (OverflowError)
from /opt/crystal/src/compress/zip/writer.cr:226:25 in 'write_end_of_central_directory'
from /opt/crystal/src/compress/zip/writer.cr:191:5 in 'close'
from /opt/crystal/src/compress/zip/writer.cr:56:25 in '__crystal_main'
from /opt/crystal/src/crystal/main.cr:129:5 in 'main_user_code'
from /opt/crystal/src/crystal/main.cr:115:7 in 'main'
from /opt/crystal/src/crystal/main.cr:141:3 in 'main'

Environment:

  • Originally encountered on Ubuntu 22.04 LTS, cpu: AMD64, memory: 3.8GB, compiled as static binary.
  • Example code reproduced on development machine macOS 14.4.1, cpu M1 Max, memory 64GB

Dev / Crystal installation:
Crystal 1.12.1 [4cea101] (2024-04-11)

LLVM: 15.0.7
Default target: aarch64-apple-macosx11.0

@Blacksmoke16
Copy link
Member

Blacksmoke16 commented Apr 24, 2024

Pretty sure this is a limit with ZIP archives themselves, in that they can only have 65,535 entries, which is the max value of a UInt16. Once you add one more than this, you get the overflow.

Ideally solution to this would be #7103. But we should at least throw a more informative error message I'd say.

@sqonk
Copy link
Author

sqonk commented Apr 24, 2024

ok so if I understand correctly what you are saying is the ZIP writer within the Stdlib supports the traditional zip archive format which has a 65k limit on entries and so to go above this you need something like the 3rd party lib you mentioned?

It does solve my immediately problem - so thank you for that.

Interestingly, if I generate 1,000,000 4 byte files on disk and then run the zip command line tool over it, it will sit and think about it for a bit but it will compress the whole thing in the end. Am I confusing the functionality of CLI tool with what the underpinnings of the Zip format or zlib offer?

In any case thank for the feedback.

@Blacksmoke16
Copy link
Member

Blacksmoke16 commented Apr 24, 2024

Yea the API docs for Compress::Zip calls out:

only compression methods 0 (STORED) and 8 (DEFLATED) are supported. Additionally, ZIP64 is not yet supported.

So it's using the original non-ZIP64 version which has that limitation. There is #11396, but not sure of the status of that PR at the moment.

@sqonk
Copy link
Author

sqonk commented Apr 26, 2024

Thanks for that. I tried out the WeTransfer/cr_zip_tricks and it did work on the isolated test project I made for exploring this issue, however it sadly gave inexplicable compile-time errors (some kind of inability to see the built-in Digest library) on the main project I wanted to use it with.

Lacking the time to find out what the root cause was I found another library: https://shardbox.org/shards/zip64/releases/0.2.1/
... which worked first time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants