-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Unify SSO between InlinedString
and String
type
#2467
Comments
Thanks for filing this! We actually discussed this exact topic earlier in the week in the team's design meeting. Here are my notes:
With SSO specifically, we talked about a few things like ABI concerns and the like, but decided on the following:
There are some future enhancements we can make for SSO implementation such as:
How does this sound to you and @lsh (who was asking me about this offline the other day)? |
InlinedString
and String
type
Thanks for the detailed plan. Looks good overall, I'm wondering about one thing:
If we do this step mentionned afterwards:
Wouldn't that be work that we'll quickly throw away? If I understand correctly the goals here, removing |
Ah yes, good point. Sorry about that: feel free to skip the step about removing |
I'm going through the code right now and I'm wondering if it makes sense to have a struct like "SmallSizeOptimizedListOfBytes" (name TBD) which has SSO but the interface of a list. I think the number of methods to implement would be smaller and the String stuct would use it as a list, it would be transparent and not require many changes compared to now. Does that make sense or am I missing something? |
…39560) [External] [stdlib] Add `InlineList` struct (stack-allocated List) This struc is very useful to implement SSO, it's related to * #2467 * #2507 If this is merged, I can take advantage of this in my PR that has the SSO POC About `InlineFixedVector`: `InlineList` is different. Notably, `InlineList` have its capacity decided at compile-time, and there is no heap allocation (unless the elements have heap-allocated data of course). `InlineFixedVector` stores the first N element on the stack, and the next elements on the heap. Since not all elements are not in the same spot, it makes it hard to work with pointers there as the data is not contiguous. Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes #2587 MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
I'm posting here the benchmarks results as it's better to have them here than in a PR. SBO: Small buffer optimization Pull requests (draft only, they're proof of concept)Here are the relevant PR and highlights, note that all stdlib tests are passing for each PR, with the exception of the materialization bug:
This PR was here only to show what is needed to implement SBO in
The materialization compiler bugIt can be observed by doing a checkout on this PR: and running the following code: fn foo():
alias my_list: List[Int8, 10] = List[Int8, 10](0, 1, 2)
print("Materializing my_list")
var my_list_materialized = my_list # <-- bug here
print("all done, exiting function")
def main():
foo()
print("main exiting.") This produces a The full bug report with minimal reproducible example on nightly is here: #2637 BenchmarksSystem: Debian Results
The benchmark code import time
fn get_dummy_int_list(n: Int) -> List[Int]:
var result = List[Int](capacity=n)
for i in range(n):
result.append(i)
return result
fn get_json_looking_string() -> String:
var int_list = get_dummy_int_list(50000)
# dang this is slow
return __type_of(int_list).__str__(int_list)
fn benchmark_1(list_of_ints: List[Int]):
var result_list = List[String](capacity=len(list_of_ints))
for i in range(len(list_of_ints)):
result_list.append(str(list_of_ints[i]))
fn benchmark_2():
var result = String()
for _ in range(10000):
result += "a"
result += "b"
result += "c"
result = result[:-2]
fn benchmark_3(json_looking_string: String):
# we want to get a list of ints from a string looking like [1, 2,3 ,4,5,6, 7,8 ,9,10]
var stripped = json_looking_string.removeprefix("[").removesuffix("]")
var split: List[String]
try:
split = stripped.split(",")
except:
abort("this should never happen, split didn't work")
split = List[String]("hello")
var result = List[Int](capacity=len(split))
for x in split:
try:
result.append(int(x[].strip()))
except e:
abort("this should never happen, error" + str(e))
result.append(0)
fn benchmark_4(input_list: List[Int]):
var result = __type_of(input_list).__str__(input_list)
def main():
print("starting benchmarks")
var list_of_ints = get_dummy_int_list(1000000)
for i in range(10):
benchmark_1(list_of_ints)
t1 = time.now()
benchmark_1(list_of_ints)
t2 = time.now()
total_time = (t2-t1)//1_000
speedup = 212614 / total_time
print("benchmark 1:", (t2-t1)//1_000, "us (x" + str(speedup)[:4] + ")")
for i in range(10):
benchmark_2()
t1 = time.now()
benchmark_2()
t2 = time.now()
total_time = (t2-t1)//1_000
speedup = 69444 / total_time
print("benchmark 2:", (t2-t1)//1_000, "us (x" + str(speedup)[:4] + ")")
json_looking_string = get_json_looking_string()
for i in range(10):
benchmark_3(json_looking_string)
t1 = time.now()
benchmark_3(json_looking_string)
t2 = time.now()
total_time = (t2-t1)//1_000
speedup = 36218 / total_time
print("benchmark 3:", (t2-t1)//1_000, "us (x" + str(speedup)[:4] + ")")
list_of_ints = get_dummy_int_list(10000)
for i in range(10):
benchmark_4(list_of_ints)
t1 = time.now()
benchmark_4(list_of_ints)
t2 = time.now()
total_time = (t2-t1)//1_000
speedup = 191769 / total_time
print("benchmark 4:", (t2-t1)//1_000, "us (x" + str(speedup)[:4] + ")") |
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray` This is pretty useful to implement short string optimization. See #2467 Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes #2642 MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray` This is pretty useful to implement short string optimization. See modularml#2467 Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes modularml#2642 MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86 Signed-off-by: rd4com <[email protected]>
…39560) [External] [stdlib] Add `InlineList` struct (stack-allocated List) This struc is very useful to implement SSO, it's related to * modularml#2467 * modularml#2507 If this is merged, I can take advantage of this in my PR that has the SSO POC About `InlineFixedVector`: `InlineList` is different. Notably, `InlineList` have its capacity decided at compile-time, and there is no heap allocation (unless the elements have heap-allocated data of course). `InlineFixedVector` stores the first N element on the stack, and the next elements on the heap. Since not all elements are not in the same spot, it makes it hard to work with pointers there as the data is not contiguous. Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes modularml#2587 MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554 Signed-off-by: Lukas Hermann <[email protected]>
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray` This is pretty useful to implement short string optimization. See modularml#2467 Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes modularml#2642 MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86 Signed-off-by: Lukas Hermann <[email protected]>
Hello there, I haven't had much feedback on this issue. I had some feedback on the implementation details in #2632 but no confirmation that this is the direction we want to take. There is also the materialization bug at play #2637 . I'd like to ask a couple question to unblock the situation and avoid doing work which will be thrown away:
If the answer to all three questions is "Yes" then the plan will be:
This will open the door for those future works:
Do I have a go of the maintainers on this plan? @JoeLoser @rparolin @ConnorGray |
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray` This is pretty useful to implement short string optimization. See modularml#2467 Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes modularml#2642 MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86
I went forward with the plan and here are the PRs, to review in this order:
|
…39560) [External] [stdlib] Add `InlineList` struct (stack-allocated List) This struc is very useful to implement SSO, it's related to * #2467 * #2507 If this is merged, I can take advantage of this in my PR that has the SSO POC About `InlineFixedVector`: `InlineList` is different. Notably, `InlineList` have its capacity decided at compile-time, and there is no heap allocation (unless the elements have heap-allocated data of course). `InlineFixedVector` stores the first N element on the stack, and the next elements on the heap. Since not all elements are not in the same spot, it makes it hard to work with pointers there as the data is not contiguous. Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes #2587 MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray` This is pretty useful to implement short string optimization. See #2467 Co-authored-by: Gabriel de Marmiesse <[email protected]> Closes #2642 MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86
Review Mojo's priorities
What is your request?
We currently have https://docs.modular.com/mojo/stdlib/utils/inlined_string#inlinedstring and String. Does it make sense to have SSO enabled in String too? Is it something that's currently possible or would that be too much trouble for the compiler?
What is your motivation for this change?
I guess that having SSO enabled on String would provide speed improvements. Furthermore, as we are writing more and more functions that work with Strings, I believe that we'll be avoiding a big refactor if we want SSO afterwards.
Any other details?
Maybe that would require to make the internal buffer of String actually private? I wonder if some functions use the underlying pointer or List directly.
The text was updated successfully, but these errors were encountered: