Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roaring_bitmap_portable_deserialize_safe avoids overflows during deserialization but it does not validate the input #324

Open
K2 opened this issue Nov 21, 2021 · 15 comments

Comments

@K2
Copy link

K2 commented Nov 21, 2021

Testcase for attached crash. Build with ASAN & ASAN CRoaring.

crash.zip

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include "roaring/roaring.h"


int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size){
    roaring_statistics_t stats;
    bool answer = true;
    roaring_bitmap_t* bitmap = roaring_bitmap_portable_deserialize_safe(data, size);
    if(bitmap) {
        uint64_t card1 = roaring_bitmap_get_cardinality(bitmap);
        roaring_bitmap_statistics(bitmap, &stats);
        unsigned universe_size = stats.max_value + 1;
        roaring_bitmap_t *inverted = roaring_bitmap_flip(bitmap, 0U, universe_size);
        if(inverted) {
            roaring_bitmap_t *double_inverted = roaring_bitmap_flip(inverted, 0U, universe_size);
            if(double_inverted) {
                answer = (roaring_bitmap_get_cardinality(inverted) + roaring_bitmap_get_cardinality(bitmap) == universe_size);
                if (answer) answer = roaring_bitmap_equals(bitmap, double_inverted);
                if (!answer) {
                    printf("Bad flip\n\nbitmap1:\n");
                    roaring_bitmap_printf_describe(bitmap);  // debug
                    printf("\n\nflipped:\n");
                    roaring_bitmap_printf_describe(inverted);  // debug
                }
                roaring_bitmap_free(double_inverted);
            }
            roaring_bitmap_free(inverted);
        }
        roaring_bitmap_free(bitmap);
    }
    return 0;
}

long filesize(FILE* fp) { 
    fseek(fp, 0L, SEEK_END);
    return ftell(fp);
}

int main(int argc, char **argv)
{
    printf("reading %s\n", argv[argc-1]);
    FILE* fp = fopen(argv[argc-1], "rb");
    long bytes = filesize(fp);
    char* buf = malloc(bytes);
    if(buf == NULL) return 1;

    rewind(fp);

    int cnt = fread(buf, 1, bytes, fp);
    if(bytes != cnt){
        free(buf);
        return 1;
    }
    LLVMFuzzerTestOneInput(buf, cnt);
    free(buf);
    return 0;
}
reading /tmp/crash-36b07662f33633b92010f875551ad7560f3045b5
=================================================================
==128963==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000232 at pc 0x00000055e5b7 bp 0x7ffe73ea6400 sp 0x7ffe73ea63f8
WRITE of size 2 at 0x602000000232 thread T0
    #0 0x55e5b6 in array_container_negation_range (/home/files/git/cr/build/croaring_fuzzer+0x55e5b6)
    #1 0x4d0594 in insert_flipped_container roaring.c
    #2 0x4ff360 in roaring_bitmap_flip (/home/files/git/cr/build/croaring_fuzzer+0x4ff360)
    #3 0x4cbf6a in LLVMFuzzerTestOneInput /home/files/git/cr/build/croaring_fuzzer.c:18:49
    #4 0x4cc21f in main /home/files/git/cr/build/croaring_fuzzer.c:57:5
    #5 0x7f1941d7cfcf in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #6 0x7f1941d7d07c in __libc_start_main csu/../csu/libc-start.c:409:3
    #7 0x41d8b4 in _start (/home/files/git/cr/build/croaring_fuzzer+0x41d8b4)

0x602000000232 is located 0 bytes to the right of 2-byte region [0x602000000230,0x602000000232)
allocated by thread T0 here:
    #0 0x49a06d in __interceptor_malloc (/home/files/git/cr/build/croaring_fuzzer+0x49a06d)
    #1 0x5298ba in array_container_create_given_capacity (/home/files/git/cr/build/croaring_fuzzer+0x5298ba)
    #2 0x55deff in array_container_negation_range (/home/files/git/cr/build/croaring_fuzzer+0x55deff)
    #3 0x4d0594 in insert_flipped_container roaring.c
    #4 0x4ff360 in roaring_bitmap_flip (/home/files/git/cr/build/croaring_fuzzer+0x4ff360)
    #5 0x4cbf6a in LLVMFuzzerTestOneInput /home/files/git/cr/build/croaring_fuzzer.c:18:49
    #6 0x4cc21f in main /home/files/git/cr/build/croaring_fuzzer.c:57:5
    #7 0x7f1941d7cfcf in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/files/git/cr/build/croaring_fuzzer+0x55e5b6) in array_container_negation_range
Shadow bytes around the buggy address:
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff8000: fa fa 00 03 fa fa 00 00 fa fa 06 fa fa fa 00 00
  0x0c047fff8010: fa fa 04 fa fa fa 00 00 fa fa 04 fa fa fa 00 00
  0x0c047fff8020: fa fa 04 fa fa fa 00 00 fa fa fd fd fa fa fd fa
  0x0c047fff8030: fa fa fd fd fa fa fd fa fa fa fd fd fa fa fd fa
=>0x0c047fff8040: fa fa 00 00 fa fa[02]fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==128963==ABORTING
@lemire
Copy link
Member

lemire commented Nov 22, 2021

@K2 Does the data come from a valid bitmap?

The roaring_bitmap_portable_deserialize_safe function does not validate the content, it only ensures that there is no buffer overflow while deserializing. If you are pointing at garbage, you will end up with trouble later.

Validating the input is more expensive. We do not currently do this tasks.

I will retitle the issue.

@lemire
Copy link
Member

lemire commented Nov 22, 2021

The documentation is as follows:

/**
 * read a bitmap from a serialized version in a safe manner (reading up to maxbytes).
 * This is meant to be compatible with
 * the Java and Go versions. See format specification at
 * https://github.com/RoaringBitmap/RoaringFormatSpec
 * In case of failure, a null pointer is returned.
 */
roaring_bitmap_t *roaring_bitmap_portable_deserialize_safe(const char *buf, size_t maxbytes);

It does not validate the input.

@lemire lemire changed the title crash in array_container_negation_range roaring_bitmap_portable_deserialize_safe avoids overflows during deserialization but it does not validate the input Nov 22, 2021
@K2
Copy link
Author

K2 commented Nov 22, 2021

What would make a sensible test case if were to enroll into oss-fuz then?

@lemire
Copy link
Member

lemire commented Nov 22, 2021

@K2 So the expectation currently is that you are pointing a data you serialized. You certainly can create adversarial content, as your code demonstrate.

To validate, we need to scan the content of the container, and check that they could have been produced from valid roaring bitmaps:

  • The array containers should be sorted.
  • The run containers should contain sorted runs that correctly span no more than the 16-bit range.

There might be other constraints that I do not have in mind right now.

We do not do this. It certainly doable, but it is not something we offer right now.

@lemire
Copy link
Member

lemire commented Nov 22, 2021

What would make a sensible test case if were to enroll into oss-fuz then?

It is not that it is not valid, it is that we do not support this kind of validation currently. (To be clear, we could do it and I plan to leave this issue open.)

The way the Java and Go versions do fuzzing... is... let me offer pointers...

@lemire
Copy link
Member

lemire commented Nov 22, 2021

(Your fuzzing is good, let me be clear.)

@K2
Copy link
Author

K2 commented Nov 22, 2021

Then perhaps to minimize the testroaring_bitmap_portable_deserialize_safe for now and update the test case in the future as we add more validation?

@lemire
Copy link
Member

lemire commented Nov 22, 2021

At a glance, the Go deserialization fuzzer checks that there is no panic while deserialization...

https://github.com/RoaringBitmap/roaring/blob/d626fcaa277d89a8f843afeb9ec85fd959d8e6dd/serializationfuzz.go

So basically, you throw anything at roaring_bitmap_portable_deserialize_safe and it should work (not crash). However, the result might be invalid and useless.

However, the bulk of the fuzzer is based on random operations on both a bitset and a roaring bitmap, and testing that the results agree:

https://github.com/RoaringBitmap/roaring/blob/258dc1c0bd982b05d94dc23563bf55e46a77763e/smat.go

@lemire
Copy link
Member

lemire commented Nov 22, 2021

@richardstartin Did something similar in Java...
https://github.com/RoaringBitmap/RoaringBitmap/pull/395/files

That is, you take a bitset (or another set data structure) and a roaring bitmap, and then you mutate both in a semantically equivalent manner, then you check that the results are semantically equivalent... and so forth.

@lemire
Copy link
Member

lemire commented Nov 22, 2021

Then perhaps to minimize the testroaring_bitmap_portable_deserialize_safe for now and update the test case in the future as we add more validation?

Right. So just testing bitmap_portable_deserialize_safe with adversarial inputs is excellent.

@lemire
Copy link
Member

lemire commented Nov 22, 2021

Consider that we have no fuzzing right now. Whatever you provide is a likely a win.

@lemire
Copy link
Member

lemire commented Nov 22, 2021

(It would be nice to have a fully validating deserializer. Arguably, it would be easier to build with a good test framework.)

@briangreenery
Copy link

Does a fully validating deserializer exist today?

@lemire
Copy link
Member

lemire commented May 31, 2023

It does not! Pull request invited!

(It is not super hard.)

@SalvatorePreviti
Copy link
Contributor

SalvatorePreviti commented Jun 2, 2023

Related to this, just raised this #486 to provide also roaring_bitmap_deserialize_safe for the CRoaring format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants