New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Add optional small buffer optimization in List
#2613
base: nightly
Are you sure you want to change the base?
[stdlib] Add optional small buffer optimization in List
#2613
Conversation
Signed-off-by: gabrieldemarmiesse <[email protected]>
This is interesting, my initial thoughts are I like this idea, as long as we can get all the same behavior. Carrying the extra member for the SBO doesn't add any size to the struct (when SBO=0) or anything like that, so maybe it's possible to get it practically seamless. |
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work @gabrieldemarmiesse, thank you for tackling this.
I agree we will probably want additional benchmarks to validate a change like this, but the implementation looks pretty clean and solid. Really appreciate how you're testing this new implementation for various sbo_size
values!
Since this is still in Draft mode I'll hold off on saying too much until you think its ready, but I'm leaving a few minor suggestions here and there.
move_pointee(src=self.data + i, dst=new_data + i) | ||
@parameter | ||
if Self.sbo_enabled: | ||
if self.data != new_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion Include a comment here for readability, mentioning that we don't need to move anything if the data is stored in the inline buffer and the new capacity still fits in that inline buffer.
if self._sbo_is_in_use(): | ||
# Someone else is going to free the pointer, the data in the buffer will be invalid | ||
# We have to give a pointer that will outlive this object. | ||
# YOLO, could do better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion This workaround seems reasonable in the short term, but can we replace this with a TODO
or FIXME
comment (unfortunately YOLO: XXX
isn't something folks are in the habit of grepping for to search for work that needs doing 😛)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, sure thing, this is still in draft, I wouldn't let that go to the main branch :)
@@ -110,7 +110,8 @@ fn _static_tuple_construction_checks[size: Int](): | |||
Parameters: | |||
size: The number of elements. | |||
""" | |||
constrained[size > 0, "number of elements in `StaticTuple` must be > 0"]() | |||
pass | |||
# constrained[size > 0, "number of elements in `StaticTuple` must be > 0"]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to take a bit of wrangling. We were just talking about this constraint internally; its needed for size/alignment issues, but as features like this PR exemplify, that restriction might be handled better on a case-by-case basis.
A pointer to the data. | ||
""" | ||
# YOLO, we're crazy people. This is highly unsafe. | ||
return UnsafePointer(self).bitcast[Self.ElementType]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return UnsafePointer(self).bitcast[Self.ElementType]() | |
return UnsafePointer(self._array).bitcast[Self.ElementType]() |
Suggestion Make it more explicit that this is a pointer to the underlying !pop.array
value.
This is actually pretty safe (as far as unsafe operations go) 🙂
What would NOT be very safe is using pop.array.gep
to the 0th element and then creating offset pointers from that—I know from experience with _ArrayMem
that that misleads the the Mojo compiler and leads to strange UB-like results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into this footgun too, getting the 0th element was NOT the right call. I lost one hour with this XD
@@ -768,3 +990,7 @@ def main(): | |||
test_list_count() | |||
test_list_add() | |||
test_list_mult() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion Since the pattern of creating a parameterized "inner" test function repeats for essentially all of the test functions in this file, I suggest we instead parameterize the top-level test functions (i.e. don't have any inline functions at all), and then additionally move the calls from the main()
driver function out to their own function, so that we have:
def run_list_tests[sbo_size: Int]():
test_mojo_issue_698[sbo_size]()
test_list[sbo_size]()
test_list_clear[sbo_size]()
...
def main():
unroll[run_list_tests, 10]()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I indeed though about this, and I was wondering how compatible it will be with the test runner currently in the works. Do you have any info about this? As far as I know the test runner doesn't execute main()
@@ -110,7 +110,8 @@ fn _static_tuple_construction_checks[size: Int](): | |||
Parameters: | |||
size: The number of elements. | |||
""" | |||
constrained[size > 0, "number of elements in `StaticTuple` must be > 0"]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended to disabling this compile-time check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. We need for this PR to work to create a InlineArray of size 0, because we can't conditionally declare fields in a struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the check can be kept as >= 0?
Returns: | ||
A pointer to the data. | ||
""" | ||
# YOLO, we're crazy people. This is highly unsafe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, sure, this is in draft, don't worry, I wouldn't let that type of comment go into the main branch
if self._sbo_is_in_use(): | ||
# Someone else is going to free the pointer, the data in the buffer will be invalid | ||
# We have to give a pointer that will outlive this object. | ||
# YOLO, could do better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove or reword 'YOLO, could do better'.
self.size = 0 | ||
self.capacity = 0 | ||
# needed to avoid "potential indirect access to uninitialized value 'self.data'" | ||
self.data = UnsafePointer[T]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this into the else clause? Trying to avoid writing to self.data twice.
self._small_buffer = existing._small_buffer | ||
|
||
@parameter | ||
if Self.sbo_enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: SBO is a common optimization that it would be great if we had an internal abstraction that we use to make it easier to plug into existing collection implementations. For example, we could have a Layout
class that handles all the SBO storage complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I don't mind making something more generic. Do you have an example in mind of what other Mojo collection could benefit from SBO? As far as I know, all other dynamic collections use a List
behind the scenes, this is the most basic type.
self.data = self._small_buffer.unsafe_ptr() | ||
return | ||
|
||
self.data = UnsafePointer[T].alloc(capacity) | ||
self.capacity = capacity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider implementing the fbstring sbo strategy to increase the amount of characters we store in the object before spilling to the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very smart trick! If that's okay by you we can leave it for another pull request, as to avoid too much complexity in a single PR.
If there's going to be a std |
|
more like the current |
Related to #2467
This is in the work for SSO. I'm trying things and I'd like to gather community feedback.
At first, I wanted to implement SSO using
Variant[InlineList, List]
, while that would have worked, we would still need to look at the flag of the variant for pretty much any operation we need to do. For example reading the first element in the list of bytes.If we instead store the pointer to the data being used, the size of the data and the capacity, then the pointer can point to the stack or the heap, we just follow the pointer and that's it. No if-else necessary. We can know if we are in the stack or the heap by looking at the address of the pointer. We don't even require a if-else to get the size of the data since it's always stored at the same spot.
I was planning to implement this using a custom structure, but in the end, I thought that I could use the powerful metaprogramming features of Mojo to add the small buffer optimisation (SBO) directly to the
List[]
struct. By default it would be disabled (=0
) and we can use@parameter if
at the few places needed, for exampleappend()
to check if we are using the SBO or not.In theory, this should all be zero cost when unused, but I'm not sure right now. We also don't have a good benchmarking story. Also there might be some incompatibility if users make functions that don't take into account this new argument. e.g.
So I am trying to get some community feedback here, as well as feedback from the maintainers, to understand if this makes sense or not. If not, I can implement a whole new struct, but there might be a lot of duplication.
I'll continue to work on this a bit (it's a good way to have a quick prototype up and running) and see if I can beat the numbers I got in this PR:
String
struct #2507EDIT:
I run every List tests with 10 different buffer sizes (0 to 9), so I believe we can consider that it works. I'm currently working on using it inside a String. And it's having memory issues left and right. I wouldn't be surprised if
String
was misusingList
in a few places. Most of the diff is the unit tests being places inside a for-loop. The changes for SBO itself are pretty small.EDIT 2:
Despite passing the full test suite of
List
, I found something that didn't crash with the previous list but crashes with this one:EDIT 3: The whole report and benchmarks are here: #2467 (comment)
EDIT 4: The bug report is here: #2637