Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve efficiency of is_empty #354

Open
overlookmotel opened this issue Jan 2, 2024 · 10 comments
Open

Improve efficiency of is_empty #354

overlookmotel opened this issue Jan 2, 2024 · 10 comments

Comments

@overlookmotel
Copy link
Contributor

is_empty checks both inline length and heap length:

pub fn is_empty(&self) -> bool {
let len_heap = ensure_read(self.1);
let last_byte = self.last_byte() as usize;
let mut len = last_byte.wrapping_sub(LastUtf8Char::L0 as u8 as usize);
if last_byte >= LastUtf8Char::Heap as u8 as usize {
len = len_heap;
}
len == 0
}

I believe an empty string is always stored inline. Therefore it could be reduced to:

pub fn is_empty(&self) -> bool {
  self.last_byte() == LastUtf8Char::L0 as u8
}

If my assumption is correct, let me know and I'll make a PR.

@NobodyXu
Copy link
Contributor

NobodyXu commented Jan 2, 2024

It's almost correct, except when API CompactString::from_string_buffer is called, it can be used to construct an empty string allocated on heap.

IMO you can fix this by changing from_string_buffer to special case empty string, but I'm not sure if @ParkMyCar or @Kijewski agrees on this since this does make it less useful for its intended use case.

@Kijewski
Copy link
Contributor

Kijewski commented Jan 2, 2024

It would fail for CompactString::with_capacity(), too.

@NobodyXu
Copy link
Contributor

NobodyXu commented Jan 2, 2024

It would fail for CompactString::with_capacity(), too.

In that case, maybe we should check for capacity and length, instead of modifying from_string_buffer?

@overlookmotel
Copy link
Contributor Author

Oh dear. Yes, I completely failed to consider that capacity and length are independent, so zero-length strings can be on the heap.

In that case, I don't think there's anything to be done here.

If static string didn't have it's own discriminant, then this would work:

pub fn is_empty(&self) -> bool {
    let heap_len = self.1;
    let last_byte = self.last_byte(); 
    // Empty inline string also contains 0 in the bytes holding heap length
    heap_len == 0 && (last_byte == 192 || last_byte == 224)
}

That produces branch-free code which might be a bit faster. But adding || last_byte == 225 does result in a branch, so it's no good.

Unless anyone else sees any mileage in this, I'll close the issue.

@Kijewski
Copy link
Contributor

Kijewski commented Jan 4, 2024

Yeah, I guess this particular method is as good as it gets for the current data representation.

But nevertheless, thank you for looking into possible optimizations! It is very easy to convince yourself that whatever you implemented is the optimum, so it's always welcome to get a new perspective to challenge your view and maybe teach you something new!

@NobodyXu
Copy link
Contributor

NobodyXu commented Jan 4, 2024

If static string didn't have it's own discriminant, then this would work:

I think you code might still work, since static string is guaranteed to be non-empty now (inlined if empty) and its layout is stable:

pub struct StaticStr {

@overlookmotel
Copy link
Contributor Author

I think you code might still work, since static string is guaranteed to be non-empty now (inlined if empty) and its layout is stable

Unfortunately I was out of date - the value of HEAP_MASK is now 216 not 224. Changing my 2nd version to last_byte == 192 || last_byte == 216 produces a branch, because testing for "192 or 216" can't be done with an and operation the way "192 or 224" can.

I think I'll give up at this point! But thanks both for entertaining my attempts.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Jan 4, 2024

Oh wait! Maybe I've got something:

const LENGTH_MASK: u8 = 192;
const HEAP_MASK: u8 = 216;
const HEAP_MASK_AFTER_SUB: usize = (HEAP_MASK as usize)
  .wrapping_sub(LENGTH_MASK as usize);

pub fn is_empty_new(&self) -> bool {
  let len_heap = ensure_read(self.1);
  let last_byte = self.last_byte() as usize;
  let mut len = last_byte.wrapping_sub(LENGTH_MASK as usize);
  // NB `==` not `>=`
  if len == HEAP_MASK_AFTER_SUB {
    len = len_heap;
  }
  len == 0
}

If I'm not mistaken (but very possible I am), this uses 1 less register, as last_byte is only used once.

https://godbolt.org/z/x3Whd115j

This relies on static strings never being 0 length (if that is the case).

I thought this might also have application for len() and as_slice(), but that didn't take into account 24-byte inline strings.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Jan 7, 2024

Or is this better?

const LENGTH_MASK: u8 = 192;
const HEAP_MASK: u8 = 216;
const HEAP_MASK_AFTER_SUB: usize = (HEAP_MASK as usize)
  .wrapping_sub(LENGTH_MASK as usize);

pub fn is_empty_new2(&self) -> bool {
  let len_heap = self.1;
  let last_byte = self.last_byte() as usize;
  let len = last_byte.wrapping_sub(LENGTH_MASK as usize);
  let is_empty_inline = len == 0;
  // NB `==` not `>=`
  let is_empty_heap = len == HEAP_MASK_AFTER_SUB && len_heap == 0;
  is_empty_inline || is_empty_heap
}

Same instruction count, but ensure_read(self.1) is gone, and there's no branches.

https://godbolt.org/z/T1xPT3WzG

@Kijewski
Copy link
Contributor

Kijewski commented Jan 9, 2024

Without measuring it's hard to tell which version is the best.

If I remember correctly, instructions like sete (set register to 0/1 depending on the zero flag), adc (add with carry), etc. can slow things down a lot, because the instruction has to wait for an updated register file, so the instruction pipeline gets stalled. But my knowledge is > 10 years old. It might even only reflect 32 bit processors, and processors got a lot smarter in the last decade.

A call to ensure_read() should be omitted whenever possible. The method is a blackbox, and only used to force the compiler to read a variable into a register, before continuing. This way the compiler "remembers" that is has already cached the value, and it won't generate a branch to load the data, but use a cmovcc instruction instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants