Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing ByteSlice and ByteSliceMut for Vec<u8> #1045

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shashitnak
Copy link

Adresses #992

Copy link

google-cla bot commented Mar 14, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

src/lib.rs Outdated
unsafe impl ByteSlice for Vec<u8> {}

#[cfg(any(feature = "alloc", test))]
// SAFETY: This uses safe a method from stdlib.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety comment needs to prove that the safety conditions of SplitByteSlice are totally satisfied. Unfortunately, I believe safety conditions are unsatisfiable in this case. It asks:

In particular, given B: SplitByteSlice and b: B, if b.deref() returns a byte slice with address addr and length len, then if split <= len, b.split_at(split) will return (first, second) such that:

  • first's address is addr and its length is split
  • second's address is addr + split and its length is len - split

Your implementation does not satisfy the second condition, because the two Vecs will almost certainly not be adjacent in memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add a bit of extra detail here: Your implementations of ByteSlice and ByteSliceMut are fine so long as you remove the implementation of SplitByteSlice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jswrenn @joshlf Pushed a naive solution where I somehow managed to make it work. Am I at least on the right path? Or is it unacceptable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we give up the restriction of only allowing pair of Self to be returned from SplitByteSlice::split_at, we can use a type similar to VecSlice, from my example, and can split a Vec for free. And then we can also implement SplitByteSlice for VecSlice and then we can split a Vec however many times we want at zero cost. Is there a reason we would not want to remove this restriction?

Copy link
Member

@joshlf joshlf Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is still unsound - how do you prevent the CanDrop variant from being dropped before the DontDrop variant? If that happens, then the DontDrop variant points to freed memory. I could imagine solving this with lifetimes, but you might need to make SplitByteSlice even more complex to be able to carry those lifetimes (or maybe not? I'm genuinely not sure).

In the abstract I think this is an interesting idea, but it will take some work to rejigger the API to support it without adding a lot of complexity that other users (who aren't using Vec) will have to know about.

Is there a reason that your use case doesn't allow either just operating on slices (e.g. borrowing the Vec) or making a clone of the Vec into a RefCell<[u8]> (which we do support)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a new thing I tried that is probably better than the previous implementation. The VecSlice type now looks like the following

pub struct VecSlice {
    slice: VirtualVec,
    ghost: Rc<GhostVec>,
}

struct VirtualVec {
    ptr: *mut u8,
    len: usize,
    cap: usize,
}

struct GhostVec(Vec<u8>);

With every split, new VirtualVecs are created but the GhostVec is shared by all the splits and will only drop when the last VecSlice is dropped.

Although, now when I think about it, this is just a more sophisticated way of achieving what can easily be achieved by calling split_at on Vec not as Vec but as slice. This doesn't seem to provide much value I guess.

Regarding your question, I was just curious about if it was possible to split Vec at zero cost.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshlf removed all the unnecessary code. It now only contains implementations of ByteSlice and ByteSliceMut

@shashitnak shashitnak force-pushed the byte-slice-vec branch 10 times, most recently from 645e425 to a5fc43b Compare March 20, 2024 05:10
@@ -5774,6 +5774,11 @@ unsafe impl<'a> SplitByteSlice for RefMut<'a, [u8]> {
}
}

// TODO(#429): Add a "SAFETY" comment and remove this `allow`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add safety comments here and below? Our policy is not to introduce any new undocumented unsafe code while we burn down the existing TODOs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further reflection, I don't believe the safety conditions of ByteSlice are satisfiable:

Safety

Implementations of ByteSlice must promise that their implementations of Deref and DerefMut are "stable". In particular, given B: ByteSlice and b: B, b must always dereference to a byte slice with the same address and length. This is true for both b.deref() and b.deref_mut(). If B: Copy or B: Clone, then the same is also true of copies or clones of b. For example, b.deref_mut() must return a byte slice with the same address and length as b.clone().deref().

For Vec, a growable buffer, this cannot be the case: the length changes as elements are added and removed, and the address changes as elements are added.

@@ -5774,6 +5774,11 @@ unsafe impl<'a> SplitByteSlice for RefMut<'a, [u8]> {
}
}

// TODO(#429): Add a "SAFETY" comment and remove this `allow`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further reflection, I don't believe the safety conditions of ByteSlice are satisfiable:

Safety

Implementations of ByteSlice must promise that their implementations of Deref and DerefMut are "stable". In particular, given B: ByteSlice and b: B, b must always dereference to a byte slice with the same address and length. This is true for both b.deref() and b.deref_mut(). If B: Copy or B: Clone, then the same is also true of copies or clones of b. For example, b.deref_mut() must return a byte slice with the same address and length as b.clone().deref().

For Vec, a growable buffer, this cannot be the case: the length changes as elements are added and removed, and the address changes as elements are added.

#[allow(clippy::undocumented_unsafe_blocks)]
#[cfg(any(feature = "alloc", test))]
unsafe impl ByteSlice for Vec<u8> {}

// TODO(#429): Add a "SAFETY" comment and remove this `allow`.
#[allow(clippy::undocumented_unsafe_blocks)]
unsafe impl<'a> ByteSliceMut for &'a mut [u8] {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshlf, in light of the above comment, I'm not certain that this satisfies our documented safety condition either. It's trivial to define a function that changes both the length and address of a &mut [u8] between calls to deref:

fn change_add_and_length(mut slice: &mut [u8]) {
    slice = &mut slice[1..];
}

I think this indicates that our safety documentation on ByteSlice is missing some critical nuance.

@joshlf
Copy link
Member

joshlf commented May 9, 2024

Okay, now that #1215 and #1218 have landed, this can be rebased and the soundness issues should be addressable. In particular, we can implement ByteSlice and ByteSliceMut for Vec<u8>, but we can't implement any of the other traits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants