Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Add optional small buffer optimization in List #2613

Draft
wants to merge 15 commits into
base: nightly
Choose a base branch
from

Conversation

gabrieldemarmiesse
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse commented May 10, 2024

Related to #2467

This is in the work for SSO. I'm trying things and I'd like to gather community feedback.

At first, I wanted to implement SSO using Variant[InlineList, List], while that would have worked, we would still need to look at the flag of the variant for pretty much any operation we need to do. For example reading the first element in the list of bytes.

If we instead store the pointer to the data being used, the size of the data and the capacity, then the pointer can point to the stack or the heap, we just follow the pointer and that's it. No if-else necessary. We can know if we are in the stack or the heap by looking at the address of the pointer. We don't even require a if-else to get the size of the data since it's always stored at the same spot.

I was planning to implement this using a custom structure, but in the end, I thought that I could use the powerful metaprogramming features of Mojo to add the small buffer optimisation (SBO) directly to the List[] struct. By default it would be disabled (=0) and we can use @parameter if at the few places needed, for example append() to check if we are using the SBO or not.

In theory, this should all be zero cost when unused, but I'm not sure right now. We also don't have a good benchmarking story. Also there might be some incompatibility if users make functions that don't take into account this new argument. e.g.

fn get_size(x: List[Float64]) -> Int:  # Only accepts List without SBO
    return len(x)

So I am trying to get some community feedback here, as well as feedback from the maintainers, to understand if this makes sense or not. If not, I can implement a whole new struct, but there might be a lot of duplication.

I'll continue to work on this a bit (it's a good way to have a quick prototype up and running) and see if I can beat the numbers I got in this PR:

EDIT:
I run every List tests with 10 different buffer sizes (0 to 9), so I believe we can consider that it works. I'm currently working on using it inside a String. And it's having memory issues left and right. I wouldn't be surprised if String was misusing List in a few places. Most of the diff is the unit tests being places inside a for-loop. The changes for SBO itself are pretty small.

EDIT 2:
Despite passing the full test suite of List, I found something that didn't crash with the previous list but crashes with this one:

fn foo():
    alias my_list: List[Int8, 10] = List[Int8, 10](0, 1, 2)
    print("Materializing my_list")
    var my_list_materialized = my_list
    print("all done, exiting function")


def main():
    foo()
    print("main exiting.")

EDIT 3: The whole report and benchmarks are here: #2467 (comment)

EDIT 4: The bug report is here: #2637

@helehex
Copy link
Contributor

helehex commented May 10, 2024

This is interesting, my initial thoughts are I like this idea, as long as we can get all the same behavior. Carrying the extra member for the SBO doesn't add any size to the struct (when SBO=0) or anything like that, so maybe it's possible to get it practically seamless.

Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Copy link
Collaborator

@ConnorGray ConnorGray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work @gabrieldemarmiesse, thank you for tackling this.

I agree we will probably want additional benchmarks to validate a change like this, but the implementation looks pretty clean and solid. Really appreciate how you're testing this new implementation for various sbo_size values!

Since this is still in Draft mode I'll hold off on saying too much until you think its ready, but I'm leaving a few minor suggestions here and there.

move_pointee(src=self.data + i, dst=new_data + i)
@parameter
if Self.sbo_enabled:
if self.data != new_data:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion Include a comment here for readability, mentioning that we don't need to move anything if the data is stored in the inline buffer and the new capacity still fits in that inline buffer.

if self._sbo_is_in_use():
# Someone else is going to free the pointer, the data in the buffer will be invalid
# We have to give a pointer that will outlive this object.
# YOLO, could do better
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion This workaround seems reasonable in the short term, but can we replace this with a TODO or FIXME comment (unfortunately YOLO: XXX isn't something folks are in the habit of grepping for to search for work that needs doing 😛)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, sure thing, this is still in draft, I wouldn't let that go to the main branch :)

@@ -110,7 +110,8 @@ fn _static_tuple_construction_checks[size: Int]():
Parameters:
size: The number of elements.
"""
constrained[size > 0, "number of elements in `StaticTuple` must be > 0"]()
pass
# constrained[size > 0, "number of elements in `StaticTuple` must be > 0"]()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to take a bit of wrangling. We were just talking about this constraint internally; its needed for size/alignment issues, but as features like this PR exemplify, that restriction might be handled better on a case-by-case basis.

A pointer to the data.
"""
# YOLO, we're crazy people. This is highly unsafe.
return UnsafePointer(self).bitcast[Self.ElementType]()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return UnsafePointer(self).bitcast[Self.ElementType]()
return UnsafePointer(self._array).bitcast[Self.ElementType]()

Suggestion Make it more explicit that this is a pointer to the underlying !pop.array value.

This is actually pretty safe (as far as unsafe operations go) 🙂

What would NOT be very safe is using pop.array.gep to the 0th element and then creating offset pointers from that—I know from experience with _ArrayMem that that misleads the the Mojo compiler and leads to strange UB-like results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into this footgun too, getting the 0th element was NOT the right call. I lost one hour with this XD

@@ -768,3 +990,7 @@ def main():
test_list_count()
test_list_add()
test_list_mult()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion Since the pattern of creating a parameterized "inner" test function repeats for essentially all of the test functions in this file, I suggest we instead parameterize the top-level test functions (i.e. don't have any inline functions at all), and then additionally move the calls from the main() driver function out to their own function, so that we have:

def run_list_tests[sbo_size: Int]():
    test_mojo_issue_698[sbo_size]()
    test_list[sbo_size]()
    test_list_clear[sbo_size]()
     ...

def main():
    unroll[run_list_tests, 10]()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I indeed though about this, and I was wondering how compatible it will be with the test runner currently in the works. Do you have any info about this? As far as I know the test runner doesn't execute main()

@@ -110,7 +110,8 @@ fn _static_tuple_construction_checks[size: Int]():
Parameters:
size: The number of elements.
"""
constrained[size > 0, "number of elements in `StaticTuple` must be > 0"]()
Copy link
Collaborator

@rparolin rparolin May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to disabling this compile-time check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. We need for this PR to work to create a InlineArray of size 0, because we can't conditionally declare fields in a struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the check can be kept as >= 0?

Returns:
A pointer to the data.
"""
# YOLO, we're crazy people. This is highly unsafe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, sure, this is in draft, don't worry, I wouldn't let that type of comment go into the main branch

if self._sbo_is_in_use():
# Someone else is going to free the pointer, the data in the buffer will be invalid
# We have to give a pointer that will outlive this object.
# YOLO, could do better
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove or reword 'YOLO, could do better'.

self.size = 0
self.capacity = 0
# needed to avoid "potential indirect access to uninitialized value 'self.data'"
self.data = UnsafePointer[T]()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the else clause? Trying to avoid writing to self.data twice.

self._small_buffer = existing._small_buffer

@parameter
if Self.sbo_enabled:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: SBO is a common optimization that it would be great if we had an internal abstraction that we use to make it easier to plug into existing collection implementations. For example, we could have a Layout class that handles all the SBO storage complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't mind making something more generic. Do you have an example in mind of what other Mojo collection could benefit from SBO? As far as I know, all other dynamic collections use a List behind the scenes, this is the most basic type.

self.data = self._small_buffer.unsafe_ptr()
return

self.data = UnsafePointer[T].alloc(capacity)
self.capacity = capacity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider implementing the fbstring sbo strategy to increase the amount of characters we store in the object before spilling to the heap.

https://youtu.be/kPR8h4-qZdk?feature=shared&t=527

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very smart trick! If that's okay by you we can leave it for another pull request, as to avoid too much complexity in a single PR.

@helehex
Copy link
Contributor

helehex commented May 16, 2024

If there's going to be a std Array type, we could even push it out to that

@JoeLoser
Copy link
Collaborator

If there's going to be a std Array type, we could even push it out to that

InlineArray is exactly that, no?

@helehex
Copy link
Contributor

helehex commented May 22, 2024

If there's going to be a std Array type, we could even push it out to that

InlineArray is exactly that, no?

more like the current InlinedFixedVector, but thats only for deprecated AnyRegType's

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants