Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking against Intel ISAL for faster deflate operations #1780

Open
pettyalex opened this issue May 6, 2024 · 3 comments
Open

Linking against Intel ISAL for faster deflate operations #1780

pettyalex opened this issue May 6, 2024 · 3 comments

Comments

@pettyalex
Copy link
Contributor

pettyalex commented May 6, 2024

Hello,

I see that bgzf compression can optionally link against libdeflate for faster performance, but I was wondering if you've ever evaluated or considered linking against the Intel ISAL? It is BSD licensed, and offers the highest performance deflate implementation that I'm aware of, especially at lower compression ratios.

Perhaps it could be optionally linked, just like libdeflate is right now? Could it be preferred to libdeflate if both are available?

https://github.com/intel/isa-l

Benchmarks:
zlib-ng/zlib-ng#1486
powturbo/TurboBench#43

@jkbonfield
Copy link
Contributor

Looking at those benchmarks it seems like Intel have improved their performance. I did evaluate this in the past and it was simply not a convincing win.

http://www.htslib.org/benchmarks/zlib.html

Although that was Intel's zlib rather than igzip specifically, but you would think it's the same technology in both? Maybe not.

Profiling a samtools view -1 -o /tmp/tmp.bam in.bam command to see where the CPU time is spent, I see this:

  51.73%  samtools  libdeflate.so.0     [.] deflate_compress_fastest
  13.63%  samtools  libdeflate.so.0     [.] deflate_decompress_default
  10.43%  samtools  libdeflate.so.0     [.] deflate_flush_block
   9.95%  samtools  libdeflate.so.0     [.] crc32_x86_pclmul_avx
   2.36%  samtools  [kernel]            [k] 0xffffffff99800190
   2.06%  samtools  libc-2.27.so        [.] __memmove_sse2_unaligned_erms
   1.62%  samtools  libdeflate.so.0     [.] deflate_make_huffman_code
   1.21%  samtools  samtools            [.] bgzf_read
   1.02%  samtools  samtools            [.] bgzf_write
   0.95%  samtools  samtools            [.] bam_read1

Libdeflate is already fast for decoding, but according to the benchmarks may be around half the speed of ISA-L for encoding. So with ISA-L outputting level 1 BAMs may be ~50% faster throughput. We can't tell though if this holds true on the small block sizes bgzf uses without trying it out.

@pettyalex
Copy link
Contributor Author

I bring this up specifically while my group spends a lot of $ doing various bcftools tasks on cloud platforms that are spending most of their CPU time compressing output, I'm going to see if I can find some time to do this and bring you actual benchmarks on those small block sizes.

Although that was Intel's zlib rather than igzip specifically, but you would think it's the same technology in both? Maybe not.

Their newest zlib is part of Intel IPP, not ISA-L. Intel has too many competing technologies in this space, and things are ripe for confusion: https://www.intel.com/content/www/us/en/developer/articles/guide/data-compression-tuning-guide-on-xeon-systems.html

I just saw that the Intel zlib you linked isn't even the same one as the IPP maybe, but a different thing? It's kind of a nightmare:
image

Intel's zlib is also not open source, which can cause some pain:
image

@pettyalex
Copy link
Contributor Author

Oh, it looks like you already tested igzip and the answer was "nope", so I think that this may not be worth the effort.

I may spend the time anyway just to see it with my own eyes, but you have nice documentation about all this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants