Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Unify SSO between InlinedString and String type #2467

Open
1 task done
gabrieldemarmiesse opened this issue May 2, 2024 · 7 comments
Open
1 task done
Labels
enhancement New feature or request mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library

Comments

@gabrieldemarmiesse
Copy link
Contributor

Review Mojo's priorities

What is your request?

We currently have https://docs.modular.com/mojo/stdlib/utils/inlined_string#inlinedstring and String. Does it make sense to have SSO enabled in String too? Is it something that's currently possible or would that be too much trouble for the compiler?

What is your motivation for this change?

I guess that having SSO enabled on String would provide speed improvements. Furthermore, as we are writing more and more functions that work with Strings, I believe that we'll be avoiding a big refactor if we want SSO afterwards.

Any other details?

Maybe that would require to make the internal buffer of String actually private? I wonder if some functions use the underlying pointer or List directly.

@gabrieldemarmiesse gabrieldemarmiesse added enhancement New feature or request mojo-repo Tag all issues with this label labels May 2, 2024
@JoeLoser
Copy link
Collaborator

JoeLoser commented May 3, 2024

Thanks for filing this! We actually discussed this exact topic earlier in the week in the team's design meeting. Here are my notes:

  • One first step is to remove _ArrayMem in utils/inlined_string and use InlineArray directly. We may run into trouble with our buffer formatting things in builtin/int.mojo and builtin/hex.mojo since they use the default constructor, which previously left undef values in the backing pop.array whereas we don’t allow undef values in default constructor in the new InlineArray for safety purposes, so you’ll have to pick a default fill value (such as 0) I suspect (FYI @lsh).

With SSO specifically, we talked about a few things like ABI concerns and the like, but decided on the following:

  • Add SSO parameter to the existing String type. Make it a keyword-only parameter such as _Size that defaults to the fixed SSO size (such as the existing value of 24). This will make it so it's clearly a "use at your own risk" if you explicitly set this parameter in caller code (by needing to specify the keyword parameter). This is because different parameters for the SSO value would generate a distinct String type in our elaborator and have the same consequences as "template explosion" in C++ and make it difficult to use different types across APIs, e.g. String and String[_Size=32] for example. One thing to note is nothing is ABI stable in the library, and so we have freedom to change the details of the SSO which can affect the layout of the type. This is also one reason why the keyword was chosen to have an underscore prefix.
  • Remove InlinedString.
  • Keep FixedString as is. It has value still, but we can safely remove InlinedString which was being that initial stepping stone as we were thinking about SSO.

There are some future enhancements we can make for SSO implementation such as:

  • Use tagged pointers instead of tagging the variant. This will save one byte here.

How does this sound to you and @lsh (who was asking me about this offline the other day)?

@JoeLoser JoeLoser changed the title [Feature Request] Merging String and InlinedString? [Feature Request] Unify SSO between InlinedString and String type May 3, 2024
@JoeLoser JoeLoser added help wanted Extra attention is needed mojo-stdlib Tag for issues related to standard library labels May 3, 2024
@gabrieldemarmiesse
Copy link
Contributor Author

Thanks for the detailed plan. Looks good overall, I'm wondering about one thing:

One first step is to remove _ArrayMem in utils/inlined_strings and use InlineArray directly.

If we do this step mentionned afterwards:

Remove InlinedString.

Wouldn't that be work that we'll quickly throw away? If I understand correctly the goals here, removing _ArrayMem from InlinedString is not useful as InlinedString will be removed too.

@JoeLoser
Copy link
Collaborator

JoeLoser commented May 3, 2024

Thanks for the detailed plan. Looks good overall, I'm wondering about one thing:

One first step is to remove _ArrayMem in utils/inlined_strings and use InlineArray directly.

If we do this step mentionned afterwards:

Remove InlinedString.

Wouldn't that be work that we'll quickly throw away? If I understand correctly the goals here, removing _ArrayMem from InlinedString is not useful as InlinedString will be removed too.

Ah yes, good point. Sorry about that: feel free to skip the step about removing _ArrayMem. Was still drinking coffee and waking up when replying earlier 😄

@gabrieldemarmiesse
Copy link
Contributor Author

I'm going through the code right now and I'm wondering if it makes sense to have a struct like "SmallSizeOptimizedListOfBytes" (name TBD) which has SSO but the interface of a list. I think the number of methods to implement would be smaller and the String stuct would use it as a list, it would be transparent and not require many changes compared to now. Does that make sense or am I missing something?

@ematejska ematejska removed mojo-stdlib Tag for issues related to standard library help wanted Extra attention is needed labels May 6, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 7, 2024 — with Linear
JoeLoser pushed a commit that referenced this issue May 11, 2024
…39560)

[External] [stdlib] Add `InlineList` struct (stack-allocated List)

This struc is very useful to implement SSO, it's related to
* #2467
* #2507

If this is merged, I can take advantage of this in my PR that has the
SSO POC

About `InlineFixedVector`: `InlineList` is different. Notably,
`InlineList` have its capacity decided at compile-time, and there is no
heap allocation (unless the elements have heap-allocated data of
course).

`InlineFixedVector` stores the first N element on the stack, and the
next elements on the heap. Since not all elements are not in the same
spot, it makes it hard to work with pointers there as the data is not
contiguous.

Co-authored-by: Gabriel de Marmiesse <[email protected]>
Closes #2587
MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
@gabrieldemarmiesse
Copy link
Contributor Author

gabrieldemarmiesse commented May 12, 2024

I'm posting here the benchmarks results as it's better to have them here than in a PR.

SBO: Small buffer optimization
SSO: Short string optimization (SBO applied to strings)

Pull requests (draft only, they're proof of concept)

Here are the relevant PR and highlights, note that all stdlib tests are passing for each PR, with the exception of the materialization bug:

This PR was here only to show what is needed to implement SBO in List:

The materialization compiler bug

It can be observed by doing a checkout on this PR:

and running the following code:

fn foo():
    alias my_list: List[Int8, 10] = List[Int8, 10](0, 1, 2)
    print("Materializing my_list")
    var my_list_materialized = my_list  # <-- bug here
    print("all done, exiting function")


def main():
    foo()
    print("main exiting.")

This produces a src/tcmalloc.cc:302] Attempt to free invalid pointer 0x7fff4f1bcf60. Note that using the list at compile time is okay, and at runtime is fine too. The bug appear when converting an alias to a var. I suspect this is because the compiler doesn't like that a pointer can point to something on the stack or the heap dynamically.

The full bug report with minimal reproducible example on nightly is here: #2637

Benchmarks

System: Debian bookworm/sid in Docker in WSL2
Processor Intel(R) Core(TM) i7-10700KF CPU @ 3.80GHz 3.80 GHz
Installed RAM 16,0 GB
System type 64-bit operating system, x64-based processor
Edition Windows 11 Pro
Version 21H2
Installed on ‎01/‎07/‎2021
OS build 22000.2538

Results

Benchmark id nightly 11/05/2024 (836626b) Using Variant with custom inline bytes list (3271101) Using Variant with InlineList (59f19bc) Using SBO in List directly (f27b43a)
1 212750 us 70254 us (x3.02) 72997 us (x2.91) 71117 us (x2.98)
2 69441 us 85479 us (x0.81) 84488 us (x0.82) 64898 us (x1.07)
3 35758 us 8291 us (x4.36) 8423 us (x4.29) 6066 us (x5.97)
4 189726 us 225251 us (x0.85) 221874 us (x0.86) 195324 us (x0.98)
size of buffer (in bytes) 0 24 24 16
size of String (in bytes) 24 40 40 40

The benchmark code

import time


fn get_dummy_int_list(n: Int) -> List[Int]:
    var result = List[Int](capacity=n)
    for i in range(n):
        result.append(i)
    return result

fn get_json_looking_string() -> String:
    var int_list = get_dummy_int_list(50000)
    # dang this is slow
    return __type_of(int_list).__str__(int_list)

fn benchmark_1(list_of_ints: List[Int]):
    var result_list = List[String](capacity=len(list_of_ints))
    for i in range(len(list_of_ints)):
        result_list.append(str(list_of_ints[i]))

fn benchmark_2():
    var result = String()
    for _ in range(10000):
        result += "a"
        result += "b"
        result += "c"
        result = result[:-2]

fn benchmark_3(json_looking_string: String):
    # we want to get a list of ints from a string looking like [1, 2,3 ,4,5,6, 7,8 ,9,10]
    var stripped = json_looking_string.removeprefix("[").removesuffix("]")
    var split: List[String]
    try:
        split = stripped.split(",")
    except:
        abort("this should never happen, split didn't work")
        split = List[String]("hello")
    var result = List[Int](capacity=len(split))
    for x in split:
        try:
            result.append(int(x[].strip()))
        except e:
            abort("this should never happen, error" + str(e))
            result.append(0)

fn benchmark_4(input_list: List[Int]):
    var result = __type_of(input_list).__str__(input_list)


def main():
    print("starting benchmarks")
    
    var list_of_ints = get_dummy_int_list(1000000)
    for i in range(10):
        benchmark_1(list_of_ints)
    t1 = time.now()
    benchmark_1(list_of_ints)
    t2 = time.now()
    total_time = (t2-t1)//1_000
    speedup  = 212614 / total_time
    print("benchmark 1:", (t2-t1)//1_000, "us (x" + str(speedup)[:4] + ")")

    for i in range(10):
        benchmark_2()
    t1 = time.now()
    benchmark_2()
    t2 = time.now()
    total_time = (t2-t1)//1_000
    speedup  = 69444  / total_time
    print("benchmark 2:", (t2-t1)//1_000, "us (x" + str(speedup)[:4] + ")")

    json_looking_string = get_json_looking_string()
    for i in range(10):
        benchmark_3(json_looking_string)
    t1 = time.now()
    benchmark_3(json_looking_string)
    t2 = time.now()
    total_time = (t2-t1)//1_000
    speedup  = 36218  / total_time
    print("benchmark 3:", (t2-t1)//1_000, "us (x" + str(speedup)[:4] + ")")

    list_of_ints = get_dummy_int_list(10000)
    for i in range(10):
        benchmark_4(list_of_ints)
    t1 = time.now()
    benchmark_4(list_of_ints)
    t2 = time.now()
    total_time = (t2-t1)//1_000
    speedup  = 191769  / total_time
    print("benchmark 4:", (t2-t1)//1_000, "us (x" + str(speedup)[:4] + ")")

modularbot pushed a commit that referenced this issue May 14, 2024
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray`

This is pretty useful to implement short string optimization. See
#2467

Co-authored-by: Gabriel de Marmiesse <[email protected]>
Closes #2642
MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86
rd4com pushed a commit to rd4com/mojo_branch that referenced this issue May 15, 2024
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray`

This is pretty useful to implement short string optimization. See
modularml#2467

Co-authored-by: Gabriel de Marmiesse <[email protected]>
Closes modularml#2642
MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86

Signed-off-by: rd4com <[email protected]>
lsh pushed a commit to lsh/mojo that referenced this issue May 17, 2024
…39560)

[External] [stdlib] Add `InlineList` struct (stack-allocated List)

This struc is very useful to implement SSO, it's related to
* modularml#2467
* modularml#2507

If this is merged, I can take advantage of this in my PR that has the
SSO POC

About `InlineFixedVector`: `InlineList` is different. Notably,
`InlineList` have its capacity decided at compile-time, and there is no
heap allocation (unless the elements have heap-allocated data of
course).

`InlineFixedVector` stores the first N element on the stack, and the
next elements on the heap. Since not all elements are not in the same
spot, it makes it hard to work with pointers there as the data is not
contiguous.

Co-authored-by: Gabriel de Marmiesse <[email protected]>
Closes modularml#2587
MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554

Signed-off-by: Lukas Hermann <[email protected]>
lsh pushed a commit to lsh/mojo that referenced this issue May 17, 2024
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray`

This is pretty useful to implement short string optimization. See
modularml#2467

Co-authored-by: Gabriel de Marmiesse <[email protected]>
Closes modularml#2642
MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86

Signed-off-by: Lukas Hermann <[email protected]>
@gabrieldemarmiesse
Copy link
Contributor Author

gabrieldemarmiesse commented May 24, 2024

Hello there, I haven't had much feedback on this issue. I had some feedback on the implementation details in #2632 but no confirmation that this is the direction we want to take. There is also the materialization bug at play #2637 .

I'd like to ask a couple question to unblock the situation and avoid doing work which will be thrown away:

  • Do we have confirmation that [BUG] Incorrect pointer behavior when materializing a type #2637 is really a compiler bug and will be fixed and is not a misuse of the language?
  • Looking at the benchmarks do we have confirmation from the maintainers that we'll go with the 4th option (Using SBO in List directly) as this is the one that has the largest speedups?
  • Do we want to have SBO optionally present in List? There wasn't a clear "Yes" from the maintainers.

If the answer to all three questions is "Yes" then the plan will be:

  1. Create a new PR for the SBO with all the comments of [stdlib] Implement SSO using small buffer optimization on List #2632 addressed and get it merged. At the same time, the compiler devs will work to fix [BUG] Incorrect pointer behavior when materializing a type #2637.
  2. Make a pull request where I add a parameter to the String struct, something like sso_size. It will default to 0 to avoid any disruption and users will be able to play with it.
  3. After enough time has passed and we are confident all bugs are flushed out, we choose a new default sso_size which will be non-null.

This will open the door for those future works:

  • Decide how we handle the null terminator (two different strings, à la rust? All strings are null terminated à la C++? A parameter on the struct? etc...)
  • Decide how we handle UTF-8 and the support for fast indexing, slicing, len().
  • All kind of complicated optimizations can be added to the String struct after the previous two steps are done.

Do I have a go of the maintainers on this plan? @JoeLoser @rparolin @ConnorGray

martinvuyk pushed a commit to martinvuyk/mojo that referenced this issue May 24, 2024
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray`

This is pretty useful to implement short string optimization. See
modularml#2467

Co-authored-by: Gabriel de Marmiesse <[email protected]>
Closes modularml#2642
MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86
modularbot pushed a commit that referenced this issue Jun 7, 2024
…39560)

[External] [stdlib] Add `InlineList` struct (stack-allocated List)

This struc is very useful to implement SSO, it's related to
* #2467
* #2507

If this is merged, I can take advantage of this in my PR that has the
SSO POC

About `InlineFixedVector`: `InlineList` is different. Notably,
`InlineList` have its capacity decided at compile-time, and there is no
heap allocation (unless the elements have heap-allocated data of
course).

`InlineFixedVector` stores the first N element on the stack, and the
next elements on the heap. Since not all elements are not in the same
spot, it makes it hard to work with pointers there as the data is not
contiguous.

Co-authored-by: Gabriel de Marmiesse <[email protected]>
Closes #2587
MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
modularbot pushed a commit that referenced this issue Jun 7, 2024
[External] [stdlib] Add method `unsafe_ptr()` to `InlineArray`

This is pretty useful to implement short string optimization. See
#2467

Co-authored-by: Gabriel de Marmiesse <[email protected]>
Closes #2642
MODULAR_ORIG_COMMIT_REV_ID: 5739e8a67742c1841ca3c33efcd23bcc45048b86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library
Projects
None yet
Development

No branches or pull requests

3 participants