Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect niche optimization explanation for the linked list in the box section #1820

Open
errge opened this issue Feb 15, 2024 · 10 comments
Open

Comments

@errge
Copy link
Contributor

errge commented Feb 15, 2024

https://google.github.io/comprehensive-rust/smart-pointers/box.html#:~:text=A%20Box%20cannot%20be%20empty%2C%20so%20the%20pointer%20is%20always%20valid%20and%20non%2Dnull.%20This%20allows%20the%20compiler%20to%20optimize%20the%20memory%20layout

I think this is not correct.

I wrote this debugging code:

#[derive(Debug)]
enum List<T> {
    /// A non-empty list: first element and the rest of the list.
    Element(T, Box<List<T>>),
    /// An empty list.
    Nil,
}

fn main() {
    let list: List<i32> = List::Element(1, Box::new(List::Element(2, Box::new(List::Nil))));
    let list2: List<i32> = List::Element(1, Box::new(List::Element(2, Box::new(List::Nil))));
    let list3: List<i32> = List::Element(1, Box::new(List::Element(2, Box::new(List::Nil))));
    println!("{list:?}");
    unsafe {
        let first: u128 = std::mem::transmute(list);
        println!("{first:#034x}");

        let List::<i32>::Element(_, cdrbox) = list2 else {
            panic!("impossible");
        };
        let second: u128 = std::mem::transmute(*cdrbox);
        println!("{second:#034x}");

        let List::<i32>::Element(_, cdrbox3) = list3 else {
            panic!("impossible");
        };
        let List::<i32>::Element(_, cddrbox) = *cdrbox3 else {
            panic!("impossible");
        };
        let third: u128 = std::mem::transmute(*cddrbox);
        println!("{third:#034x}");
    }
}

What I'm trying to do here, is to get the representation out of Rust in all 3 steps of the linked list, and this is the output I have:

errge:~/rust $ ./target/release/box
Element(1, Element(2, Nil))
0x0000564fac517bc00000000100000000
0x0000564fac517be00000000200000000
0x0000564fac517be00000000200000001
errge:~/rust $ qemu-aarch64 ./target/aarch64-unknown-linux-musl/release/box
Element(1, Element(2, Nil))
0x00000055008068500000000100000000
0x00000055008068700000000200000000
0x00000055008068700000000200000001
errge:~/rust $ ./target/x86_64-unknown-linux-musl/release/box
Element(1, Element(2, Nil))
0x00007fa1c7b688500000000100000000
0x00007fa1c7b688700000000200000000
0x00007fa1c7b688700000000200000001

What we see here (on all 3 architectures), is that the first and second list element has an enum discriminant of (0x0), while the third one has a discriminant of (0x1).

The discriminants are at the end of the printout, because all 3 architectures are little endian.

And I also have a theoretical reasoning why the provided picture for the niche optimization is not possible: linked lists are famous for being a data structure, where you can return a reference to the middle and users can continue to walk the linked list from there, or maybe even share data from there, etc. Imagine how would we implement a find() method for this linked list? It would probably return List<T>, the first item that matched the search criteria, and the caller than can even see the items behind the found item. Now, if there are no results found, we have to return Nil, which is at the end of the list anyway, so we can just return that Nil. But if the niche optimization were really to happen the way how the slide explains, then that final closing Nil element would not be there.

I'm reporting an issue here instead of a pull request, because I'm only 95% sure, and I'm happy to do the work, but wanted to hear confirmation first from someone more experienced. @djmitche Do you have an opinion about this too?

@errge
Copy link
Contributor Author

errge commented Feb 16, 2024

Okay, I managed to increase my confidence level to 99%, by actually creating the linked list that is compatible with niche optimization. The trick is, that one has to put the Box into the Element enum discriminator right away, without any other data in-between, so that the Box is on the same level as the Nil, and therefore the optimization can trigger.

Data type in the book: Element(T, Box<List<T>>)

My data type: Element(Box<(T, List<T>)>)

In the book, between the Element discriminator and the Box there is a T in the struct in-between, which disallows the optimization.

Full source code for my linked list:

#[derive(Debug)]
enum List<T> {
    Nil,
    Element(Box<(T, List<T>)>),
}

fn main() {
    let list: List<i32> = List::Element(Box::new((42, List::Element(Box::new((84, List::Nil))))));
    let list2: List<i32> = List::Element(Box::new((42, List::Element(Box::new((84, List::Nil))))));
    let list3: List<i32> = List::Element(Box::new((42, List::Element(Box::new((84, List::Nil))))));
    println!("{list:?}");

    unsafe {
        let first: u64 = std::mem::transmute(list);
        println!("{first:#018x}");

        let List::Element(headbox) = list2 else {
            panic!("impossible");
        };
        let val2 = (*headbox).0;
        let second: u64 = std::mem::transmute((*headbox).1);
        println!("{val2} {second:#018x}");

        let List::Element(headbox3) = list3 else {
            panic!("impossible");
        };
        let List::Element(cdrbox) = (*headbox3).1 else {
            panic!("impossible");
        };
        let val3 = (*cdrbox).0;
        let third: u64 = std::mem::transmute((*cdrbox).1);
        println!("{val3} {third:#018x}");
    }
}

Output:

$ target/release/boxheadlist 
Element((42, Element((84, Nil))))
0x00005651017f2bc0
42 0x00005651017f2be0
84 0x0000000000000000

In the last line, that the optimization triggers, and Nil is simply represented with a 64 bit all zero pointer. Also, note that in my representation every item of the linked list is 64 bits + 32 bits (for the data), which is only 96 bit total, while in the book's linked list every item is 128 bit. The only disadvantage of my solution, is that it's a list with a mandatory "head item", so there is ZERO data on the stack, and if one wants the first data, even for that one needs to do a jump to the heap.

So, we have multiple options regarding the book:

  • option 1: we change nothing, if thousands of Googlers passed the lecture without noticing this, then maybe it's fine as it is,
  • option 2: we delete the section about the optimization, as it doesn't apply,
  • option 3: we change the linked list to one that triggers the optimization (e.g. mine or similar, happy to hear options).

My order of preference:

  • option 3: it's interesting and important to make the niche optimization work, rust's target is low-level at the end of the day, so it's educational to make this work;
  • option 2: this option is good, if we prefer to keep the data definition beautiful and we just mention that unfortunately the niche optimization can't trigger this time (or we don't mention the niche optimization at all);
  • option 1: I really don't prefer this, as then the book continues to teach something that is not true in reality, but it's an option.

@djmitche djmitche self-assigned this Feb 16, 2024
@anforowicz
Copy link
Collaborator

Good catch.

I think option 1 is undesirable - I also think that the current content seems incorrect - AFAIU Box::new(...non-ZST...) will always return a non-null pointer (and this means that the current slide is wrong when it suggests that this will be represented as //.)

Option 3 SGTM. (We could spend some time discussing whether option 3 is the best long-term way forward here, but it does seem like a desirable improvement over the status quo, so IMHO we should probably land this first and then just consider discussing other options as a follow-up. For example struct List(Elem, Option<Box<List>>) may be another spelling I guess?)

@djmitche
Copy link
Collaborator

This is a complex topic! It's currently buried under "more to explore", which is a nice way of saying "maybe don't talk about this in the class". I agree that we should get it right, though.

I suspect a lot of students will be interested in a slightly larger class of topics, which includes this and #1817, #1819: important implementation details of Rust. I just don't think that the String and Box slides are the right place to get into that. There's a bit of time on the last day, so perhaps we could add a segment for these, following the unsafe section?

Something like

  • Implementation Details
    • Niche Optimization
    • Dynamic array layout (including NonNull::dangling)
    • Return Value Optimization
    • ??
    • Exercise

The exercise should probably be something quick, similar to the analysis earlier in this issue: give students some unsafe code to see what's going on in the data structure, and then challenge them to experiment and explain what they see somehow.

@djmitche djmitche removed their assignment Feb 16, 2024
@QnnOkabayashi
Copy link

Also worth noting that the original example

enum List<T> {
    Element(T, Box<List<T>>),
    Nil,
}

is technically an example of niche-optimization because the following types are all the same size (according to std::mem::size_of):

List<T>
Option<(T, Box<List<T>>)>
(T, Box<List<T>>)

This means that values of type List<T> take advantage of niche optimization, since they only take up the space of (T, Box<List<T>>).

In general though, I agree that the reasoning and diagram in the book aren't correct. Niche value optimization should probably be introduced as just "Option<Box<T>> (or Option<&T>) is the same size as Box<T> by using the 0 bit pattern to represent None since it's never a valid address."

For example struct List(Elem, Option<Box<List>>) may be another spelling I guess?)

I'm in favor of this approach

@errge
Copy link
Contributor Author

errge commented Feb 16, 2024

Wow, thank you @djmitche @anforowicz @QnnOkabayashi for the great comments and for taking the time to send me detailed replies to my questions, really appreciated!

My findings today:

  • if we are on 64 bit (aarch64 or amd64): the niche optimization DOES NOT trigger with the current example, because after alignment there is enough space (e.g. 64 bit for pointer, 32 bit for data is a must, then we have 32 bit remaining for tag anyways to align to 64 bit boundary)
  • if we change the example from i32 to e.g. i64 or u64, then the niche optimization DOES trigger and merges the Nil case with the Box pointer as expected
  • in both cases: niche optimization naturally DOES NOT reorder data types or changes the structure of the tail of the list.

With these findings, I kinda changed my opinion and now I prefer option 2, let me motivate why.

I proposed this new datatype: enum List<T> { Nil, Element(Box<(T, List<T>)>)}. The problem with this one, is that it still has exactly 2 pointers, now we just moved the unnecessary pointer from the end of the list into the beginning of the list (e.g. every non-empty list starts with an indirection, no matter what). This is easy to understand and draw, but it's actually quite inferior, because now even to get the first item, we are going to the heap, and taking the first item is the most frequent operation for lists, stacks, etc.

In your comments you proposed this: struct List(Elem, Option<Box<List>>). This is another very interesting option, that I also figured out today myself as an alternative, and it's good for cache locality, first item is on the stack. The problem with it, is that this is not a List, but a NonEmptyList, e.g. it's impossible to return a fully empty list, and therefore if you want to support that, it needs another wrapper, kinda yuck if we are going for beauty.

At the stage where we are in the book (day 3 only), the best fix in my opinion is:

  • keep data structure as is (beautiful, school book List with Cons and Nil),
  • change example from i32 to i64,
  • explicitly state that with size_of we can easily see that the niche optimization triggered (e.g. data was 64, pointer was 64, discriminator was 1-bit, and we still got it into 128, magic!),
  • and draw a new picture where we still have 3 building blocks for the linked lists, but the pointer is coming out from the discriminator (as box+discriminator have been fused together).

If this looks like a plan to you all, I will write up a draft PR by end of next week (hopefully lot earlier), and then we will have something to discuss around.

@mgeisler
Copy link
Collaborator

Hi all,

Wow, thanks a lot for looking deeply into this!

At the stage where we are in the book (day 3 only), the best fix in my opinion is:

  • keep data structure as is (beautiful, school book List with Cons and Nil),
  • change example from i32 to i64,
  • explicitly state that with size_of we can easily see that the niche optimization triggered (e.g. data was 64, pointer was 64, discriminator was 1-bit, and we still got it into 128, magic!),
  • and draw a new picture where we still have 3 building blocks for the linked lists, but the pointer is coming out from the discriminator (as box+discriminator have been fused together).

If this looks like a plan to you all, I will write up a draft PR by end of next week (hopefully lot earlier), and then we will have something to discuss around.

My goals with the slide were:

  • mention the concept of "niche optimization", preferably with a link to a deeper explanation. From a quick search, I didn't actually find a good reference — however, the page in the course shows up prominently. So we'll be helping everybody out by fixing the slide.
  • use a simple example. I guess I could have used Option<Box<T>>, but a linked list seemed more interesting (if it works!)

I think your plan above sounds great: it keeps the non-trivial example but fixes. It would be great to add a test for this: a small Rust program which makes a few assertions with mem::size_of as @QnnOkabayashi mentions. We can include the code on the slide to make sure it doesn't go out of sync.

As for the diagram, it's of course very "symbolic". I've mostly drawn them by hand, but https://asciiflow.com/ can be very helpful to get a quick prototype. There is also https://ivanceras.github.io/svgbob-editor/ to visualize things in the browser.

I hope that helps!

Something like

  • Implementation Details

    • Niche Optimization
    • Dynamic array layout (including NonNull::dangling)
    • Return Value Optimization
    • ??
    • Exercise

@djmitche, I love that you look at the big picture here! I guess putting together such a segment is something we could do after fixing the slide here? Assuming there is material and interest for it, of course.

@djmitche
Copy link
Collaborator

Yeah, I think a dedicated segment is a good follow-up.

Also, we've taken a "circular" approach throughout the course, where concepts come up repeatedly. The first time, they're mentioned without much detail, and then explored in more detail later. So with the new segment perhaps we could move the linked-list example to the new segment and just leave a speaker note here mentioning the niche optimization using Option<Box<T>> and size_of.

@QnnOkabayashi
Copy link

Thinking on this more, I think we should keep the recursive data type example and the niche optimization example independent, but possibly still on the same page. The whole point of the linked list example is to show how Box<T> lets us write recursive data types. Why bother trying to also make it demonstrate niche-optimization when we could just use a more intuitive example, e.g. Option<Box<T>>?

@djmitche
Copy link
Collaborator

I think that makes a lot of sense! It also gives an opportunity to introduce the niche optimization earlier, in the Option slide (in fact, it's already mentioned there -- maybe another sentence or two there, or a link, would be useful!)

So maybe the right approach right now is this:

  1. Add a little more detail to the Option slide.
  2. Update the Smart-Pointers page to be accurate, and use i64 so that it actually uses the niche optimization.
  3. Add a new slide on the niche optimization, as a step along the way to an "Implementation Details" segment.

If that last bit is in a PR on its own, then we can hold onto it for a bit until the other slides in that segment are also written.

@djmitche
Copy link
Collaborator

@QnnOkabayashi would you be able/willing to do any or all of those steps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants