Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot match decompression time with 7-zip #93

Open
otrejo-soria opened this issue Jun 30, 2022 · 10 comments
Open

Cannot match decompression time with 7-zip #93

otrejo-soria opened this issue Jun 30, 2022 · 10 comments

Comments

@otrejo-soria
Copy link

I have question. I have trying to use this API across multiple platforms and although, I can match the compression time with 7-zip tool by messing around with

compressor.setCompressionLevel(BitCompressionLevel::Fast);
compressor.setCompressionMethod(BitCompressionMethod::Lzma2);
compressor.setDictionarySize(MB32);
compressor.setWordSize(WORDSIZE);
compressor.setThreadsCount(THREADS);
compressor.setSolidMode(true);

I cannot do the same with decompressing. I was wondering if there is something there that may be causing the the API is not really matching C:\Program Files\7-Zip time?
I know there is not much that I am giving you, but I tried to look around in the code line, and nothing pops out. I was just wondering if you have ever attempted to compare performance between 7-Zip decompression and bit7z decompression.
Any insight will be useful. Thanks

@rikyoz rikyoz self-assigned this Jul 2, 2022
@rikyoz
Copy link
Owner

rikyoz commented Jul 3, 2022

Hi!

I have question. I have trying to use this API across multiple platforms and although, I can match the compression time with 7-zip tool by messing around with

compressor.setCompressionLevel(BitCompressionLevel::Fast);
compressor.setCompressionMethod(BitCompressionMethod::Lzma2);
compressor.setDictionarySize(MB32);
compressor.setWordSize(WORDSIZE);
compressor.setThreadsCount(THREADS);
compressor.setSolidMode(true);

To be sure, with 7-zip tool, are you referring to the 7z.exe program or the 7-zip GUI?
And, in any case, are these the same settings you choose using the tool?

I cannot do the same with decompressing. I was wondering if there is something there that may be causing the the API is not really matching C:\Program Files\7-Zip time? I know there is not much that I am giving you, but I tried to look around in the code line, and nothing pops out.

How much time difference do you measure? There isn't really much configuration possible with the decompression, in contrast to what is possible with compression.

I was just wondering if you have ever attempted to compare performance between 7-Zip decompression and bit7z decompression. Any insight will be useful. Thanks

I did some basic tests and got just about +100/200ms of execution time on average, which I think is due to the inevitable overhead introduced by bit7z's interface. Anyway, I will investigate this further!

@otrejo-soria
Copy link
Author

Both 7z.exe and 7-zip GUI. Specially on very large files. I am trying to decompress files that are several GB large.

rikyoz added a commit that referenced this issue Jul 8, 2022
Use a bigger buffer for input file streams. Addresses issue #93
@rikyoz
Copy link
Owner

rikyoz commented Jul 8, 2022

I did some profiling, and it seems that the poor performance is due to the underlying std::ifstream used by bit7z. This is also a known problem, e.g., https://stackoverflow.com/questions/26095160/why-are-stdfstreams-so-slow.
In short, by default, standard file streams use a too-small buffer: increasing its size boosts bit7z's performance when dealing with big files.
I already pushed a commit (170db0d) in which I set the stream to use a 1MiB buffer.
I also did some benchmarks with MSVC to check whether the change fixed the problem. I used a compressed ~3.4GiB .7z archive file for the tests.
Using the bigger buffer seems to reduce the decompression time closer to the one of the 7z program, with an average of only ~200ms of difference.
In contrast, the non-optimized bit7z, on average, had a delay of more than 2.1s wrt. to the 7z tool.

For completeness, here is a boxplot generated from the data I collected during the benchmark; each program's sample size consists of 100 runs. The plot clearly shows the difference in the three cases:

I will further investigate the remaining ~200ms of difference, but I think that now it's a more acceptable overhead. I will also consider whether to provide users with some way to easily customize the buffer size.

Please, let me know if the commit fixes your issue!

@otrejo-soria
Copy link
Author

Sorry for taking so long to get back at you, but we were trying to gather some data because different machine were giving different results. But we did found out that your fix did improve the overall extraction time, but it still doesn't match what 7z.exe and 7-zip GUI does.
The best results shows that bit7z api is around 50% slower that both 7z.exe and 7-zip GUI. Sizes we are trying to decompress goes around ~1.1 GB.
Now on different machines it will go way slower than that 50%, but still less if not having your fix. We haven't been able to pinpoint exactly why there is such slowness.
I tried to do a quick profiling session, but didn't get much info. But what I did notice most of the time spent (which I assume that is what is causing the degradation) is in the 7z.dll.
I'll see if I can get more info digging this way, but I did want to share this infor. Thanks for looking into this. :D

@rikyoz
Copy link
Owner

rikyoz commented Jul 20, 2022

Sorry for taking so long to get back at you, but we were trying to gather some data because different machine were giving different results.

No problem! Thank you for the tests!

But we did found out that your fix did improve the overall extraction time, but it still doesn't match what 7z.exe and 7-zip GUI does. The best results shows that bit7z api is around 50% slower that both 7z.exe and 7-zip GUI. Sizes we are trying to decompress goes around ~1.1 GB. Now on different machines it will go way slower than that 50%, but still less if not having your fix.

The differences you measured may be due to the cache size of the various CPUs, which may differ from the one of the stream buffer, and thus penalizing the performance, at least in some cases (e.g., when the cache is smaller than the buffer). But I'm not an expert on this topic.

We haven't been able to pinpoint exactly why there is such slowness. I tried to do a quick profiling session, but didn't get much info. But what I did notice most of the time spent (which I assume that is what is causing the degradation) is in the 7z.dll.

I actually expect most of the time to be spent in the 7z.dll code! However, 7-zip also uses code provided by bit7z for actually reading from the files. And this is where I think it's the overhead.
So I just performed the same benchmark I did before but using the stable bit7z v3, which used the same code of 7-zip for reading files. These are the results:

The results seem closer to the 7-zip ones, with an average difference of only ~100ms (by comparison, v4 overhead was ~200ms on average).
bit7z v4 uses the C++ standard stream, which might be the reason for the performance issues.
If you are running on Windows, you might also try to do your tests with the stable bit7z v3 to check if this is the issue in your use case too.
Please note that in bit7z v3, BitFileExtractor was named BitExtractor, and that paths were all passed as std::wstring.

I'll see if I can get more info digging this way, but I did want to share this info. Thanks for looking into this. :D

You're welcome! And thank you for the tests and information you shared!

@rikyoz
Copy link
Owner

rikyoz commented Jul 21, 2022

I did a further test: I made the output file stream also use a bigger buffer.
From my benchmark, this seems to make bit7z performance even closer to 7-zip's (with few outliers):

I pushed a commit (d35b092) with the change.
In the end, I'll probably need to reimplement the stream classes without using the C++ standard streams.
Unfortunately, I cannot use the same approach used in the old bit7z v3 since it directly used code from 7-zip, which might create issues for the future re-licensing to the MPLv2 license.

@rikyoz rikyoz added this to To do in bit7z v4.0 Aug 10, 2022
@rikyoz rikyoz moved this from To do to Pending in bit7z v4.0 Dec 29, 2022
@RichardTea
Copy link

RichardTea commented Feb 28, 2023

I'm also seeing decompression being really, really slow compared to 7z.exe. I'm using the same 7z.dll v22.01 for both.

7z.exe extracts the archive in approx. 1min 53sec, while bit7z 4.0.0RC takes more than 30 minutes for the same 420MB "solid" 7z file.

On Windows there's no noticeable difference between reading the entire 420MB file into memory and passing it as a single std::vector<byte_t> and passing the filename and allowing bit7z to open the file.

(This isn't entirely surprising because Windows buffers files automatically if you start reading 'a lot' from them)

I've not found any major clues as of yet, CPU profiling just shows a lot of time being spent inside 7z.dll - as expected, of course.

@rikyoz
Copy link
Owner

rikyoz commented Feb 28, 2023

Hi!

I'm also seeing decompression being really, really slow compared to 7z.exe. I'm using the same 7z.dll v22.01 for both.
7z.exe extracts the archive in approx. 1min 53sec, while bit7z 4.0.0RC takes more than 30 minutes for the same 420MB "solid" 7z file.

That's strange! I thought to have reduced the gap with 7z.exe, but apparently, it wasn't sufficient.
Thirty minutes is really an unacceptable time; I'm sorry for the issue.
I'll investigate the problem asap.

If possible and it doesn't contain any sensitive data, could you share or send me the archive file that gives you the issue? It would really help me in finding the root cause of the problem.
Or, if you can't, could you provide some further information on the content of the 7z archive (even general info, like the kind of file stored) and the settings used to create it (e.g., word size, dictionary size, etc.)?
Any more information would be really appreciated; thank you in advance!

@RichardTea
Copy link

Unfortunately I can't share the 7z file as it's a 3rd party sending me their proprietary data ;)
The data is few tens of thousands of XML and a few thousand PNG images.

7zFM says:

Size:         1 675 718 995
Packed Size:  430 797 992
Folders:      34 370
Files:        47 481
Headers Size: 664 620
Method:       LZMA2:23
Solid:        +
Blocks:       1

I have since realised that it's not a fair comparison as I'm extracting files from the solid archive in the order they are required by the application, not the order they are within the archive.

Nearly all the time is spent in the BitArchiveReader::extract(*buffer, size, index) call, but sadly the VS2022 performance sampling pretty much stops there as I don't have PDBs for the "official" 7z.dll.

My next step is going to be to build 7z.dll to get PDBs to analyse stacks that go through the 7z.dll.

I won't have time to do that for another couple of weeks.

@rikyoz
Copy link
Owner

rikyoz commented Mar 5, 2023

Unfortunately I can't share the 7z file as it's a 3rd party sending me their proprietary data ;)

No worries, I asked just in case! :)

The data is few tens of thousands of XML and a few thousand PNG images.

7zFM says:

Size:         1 675 718 995
Packed Size:  430 797 992
Folders:      34 370
Files:        47 481
Headers Size: 664 620
Method:       LZMA2:23
Solid:        +
Blocks:       1

I did several benchmarks using a similar solid archive containing XML and PNG images (only 58 files, though, but I don't think this influences the conclusion); in each benchmark, the program was executed 100 times, extracting the archive to a SATA SSD, and after each time the output files were deleted.

In the benchmarks, when extracting the whole archive using BitFileExtracto::extract, bit7z performed similarly to the 7z program. Bit7z v3 and v4, on average, were slower by only 4% and 6%, respectively, with a few outliers that might be due to external factors.

I think the difference between v3 and v4 is because v4 uses C++'s fstream rather than the same code used by 7-zip as v3 did. I will work on improving the performance of bit7z v4, probably in the next v4.1, but the difference is not that noticeable.

I have since realised that it's not a fair comparison as I'm extracting files from the solid archive in the order they are required by the application, not the order they are within the archive.

I think the way you extract the files is probably the reason for the very low performance.
The files in solid archives are meant to be extracted in the order they are within the archive
I did some further benchmarks to check if it was the case:

The first three cases are the same as before, where I extracted the whole archive.
In the fourth case, I extracted all the files in the solid archive in random order: the extraction was ~23x slower on average compared to the whole archive extraction with 7z.
In the fifth case, the worst, I extracted all the files in reversed order compared to their index within the archive, which resulted in a ~40x slowdown.
To further check that the reason for the slowdowns was that the archive was solid, I re-created the same archive but non-solid: extracting it in reversed order resulted in only a ~1.20x slowdown.

So yeah, extracting a solid archive without following the natural order of the archive comes with the price of a significant performance loss, which isn't likely due to bit7z.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
bit7z v4.0
  
Pending
Development

No branches or pull requests

3 participants