-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #2905: enable and tune backing Array sharing in AbstractStringBuilder.toString
#2908
base: main
Are you sure you want to change the base?
Fix #2905: enable and tune backing Array sharing in AbstractStringBuilder.toString
#2908
Conversation
…actStringBuilder.toString`
David, Thank you for the PR and for moving this stone forward. |
shared = true | ||
return new String(value, 0, count) | ||
new String(0, count, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my thinking, this is, given a sane "wasted" test, without doubt a win for both
the "one-and-done", without-loss-of-generality, StringBuilder
case and the
recommended multi-use case.
Good catch.
I also like the explanatory comments. We will all certainly be back here again, trying to figure the rationale out again.
// than what is added by a single enlargeBuffer() operation | ||
// (it may happen if setLength() has been previously called to shrink the number of used characters) | ||
// => copy backing Array in a fresh String | ||
if (wasted >= INITIAL_CAPACITY && wasted >= (count >> 1) + 2) { // see enlargeBuffer() for details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been studying this line for hours, my limitation, and have been trying to figure
out what data would be needed to make a data driven decision. Sorry, but I need
more time. I do not want my limited understanding to stand in the way of progress.
I suspect that this line is problematic.
The 256 'magic number' in the original code was never explained. By reading of
the original code, the clause to the left of the ||
prevented "huge" amounts of
'wasted' space from getting passed to a String constructor. That is, it establishes
256 as the maximum amount per String of potentially wasted space of potentially long duration. One can talk about or, better yet, measure the effect of values smaller than 256 (I suspect 128 or 64 would be better).
By my, possibly flawed reading, this line of code will allow "huge" wasted space,
given a certain usage pattern. Consider a 4K 'count' with a, say, 3K 'wasted'.
I think that will go the "shared" path and waste 3K. I know people who would
say 'Ouch' (in French or Anglo-Saxon) to that.
As mentioned, I suspect, but have no data, that a simple if (wasted > MAGIC_NUMBER)
would work here, over a wide range of guessed "MAGIC_NUMBER`s, each suitably explained.
It is a minor issue, but I do not see how INITIAL_CAPACITY comes into play?
It is less of a 'magic number' both in mystery and magnitude than 256 but I miss
the logical connection, if any.
As David & I have discussed in a couple of other places, Discourse & Issues here,
the whole 'wasted' discussion is intimately wrapped up in the "grow-the-buffer" algorithm,
especially the small size of the first few (5? or so) allocations. That discussion
should not hold up progress here, but needs to be kept in the background.
I think the "Historical 1.5 grow-the-buffer" algorithm allocates too little at the
beginning and too much as the buffer grows larger. Wild guesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ each suitably explained./the finally chosen one suitably explained/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @LeeTibbert for your review.
I'll try to address the different comments/questions one by one.
The 256 'magic number' in the original code was never explained.
I'm not a big fan of it neither, as it makes the code a bit cryptic.
However, after a deeper analysis, we may find a good reason for keeping or modifying this value.
I suspect that this line is problematic.
By my, possibly flawed reading, this line of code will allow "huge" wasted space,
given a certain usage pattern. Consider a 4K 'count' with a, say, 3K 'wasted'.
I think that will go the "shared" path and waste 3K.
I may be wrong but I don't think it will lead to a wasted backing Array ("shared" path) with the data provided in your example.
I summarized your data in a spreadsheet, where I computed additional values as follows:
Variable | Value |
---|---|
count | 4000 |
wasted | 3000 |
length | 7000 |
wasted_limit | 2002 |
wasted > wasted_limit ("sharing" disabled) | true |
If I got your example, and since val wasted = value.length - count
, the backing Array length should 7K, but this is not needed for further calculations.
And the resulting wasted_limit would be estimated as (count / 2) + 2
which equals here 2002.
Thus the sharing test will fail, and the used chars will be copied in the created String.
It is a minor issue, but I do not see how INITIAL_CAPACITY comes into play?
It is less of a 'magic number' both in mystery and magnitude than 256 but I miss
the logical connection, if any.
The idea is that if the waste is very small (lower than 16), we thus always enable sharing to avoid additional allocations.
But you are right, it's still a magic value and it could be questioned.
The goal behind this PR, is to set the number of new Array()/new String()
to the bare minimum, without affecting too much the wasted memory in the underlying backing Array. The rationale, is that less allocations means less garbage and thus less work for the GC.
The best tradeoff between less allocations and less waste in the backing Array might be tricky to find. The current PR proposes one possible model. Also important to note: what makes me a bit confident about this model, is that it has been adopted in Harmony and Android reference implementations.
the whole 'wasted' discussion is intimately wrapped up in the "grow-the-buffer" algorithm,
especially the small size of the first few (5? or so) allocations. That discussion
should not hold up progress here, but needs to be kept in the background.
I think the "Historical 1.5 grow-the-buffer" algorithm allocates too little at the
beginning and too much as the buffer grows larger. Wild guesses.
Totally right, there might be room for improvement regarding the "grow-the-buffer" algorithm.
But this can be treated independently IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this can be treated independently IMO
Agreed
Thank you for the explanations.
Discussion, not to block this PR or burden you.
If it does not give away proprietary details, may I ask:
-
It sounds like you have tried this modification in your work.
What did you see? -
Do you reuse a StringBuilder or is it single use?
-
Do you add a lot of small strings to that SB or a few large (>= 1K) strings.
-
Any idea of the expected lifetime of the child Strings? Minutes, hours, days?
If I were to take a run at this, which I can not do in the near future,
I think I would first instrument a base load, and then run one of the
the Scala benchmarks against it. I do not know if there is a String
intensive one.
I think I would try to set the lower acceptable bound higher (say 32)
to try to minimize allocations, and then the higher bound much lower,
maybe (>>2 + mumble). The hope would be that, amortized, the
memory saved with "moderate" and "large" Strings would pay back
the memory spent on small strings.
Problem is that all of this almost certainly depends on the workload and
there does not appear to me to be an economic way of automatically
adjusting the workload ( recent past behavior is single best predictor of future behavior.
Perhaps maintaining a running "max" and extending by some factor of that).
As you say, a problem for another issue.
-
Seems like being willing to "waste" 2001 bytes is a gift to the people
who manufacture RAM.By my reading, current code will "waste" a maximum of 256 bytes per
String allocation.
LGTM - Looks good to me. Generations of degree students will be studying & tuning that one line of "use String constructor which will copy" code. |
Thank you @LeeTibbert for your time and your accurate review. First the questions:
At first, I was thinking to try/benchmark all my recent String related findings and only starting sharing ideas/code, after a careful set of benchmarks execution. Then I realized that it may introduce long delays (days/weeks?), while things could be discussed and improved in a more atomic and interactive fashion.
Both, see my reply in the related issue (#2905) regarding String interpolation and write to File examples. Usually I even combine both in the same program.
Regarding the first app that comes to my mind (a CSV writer), I usually add small strings to the SB (parts composing a line).
Ideally I would like to avoid Strings creation for the recycled SB (see: #2906). def append(d: scala.Double): StringBuilder = {
append0(Double.toString(d))
this
} to def append(d: scala.Double): StringBuilder = {
this.ensureCapacity(this.count + _RyuDouble.RESULT_STRING_MAX_LENGTH)
this.count = _RyuDouble.doubleToChars(d, RyuRoundingMode.Conservative, value, this.count)
this
} Thank you for your advices regarding the benchmarking procedure. This is going to be be super useful. Also, totally agree: Problem is that all of this almost certainly depends on the workload and there does not appear to me to be an economic way of automatically adjusting the workload
Yes you are right. But at the same time, how many SB are currently copied to a String while it's not necessary?
By current code, I guess you mean in the SN 0.4/main branches. If it's the case I would say that it never waste chars inside the String as a fresh String with Array copy is always created at the moment. |
| The devil is in the details I thought the devil was in Florida, on holiday... Thank you for the information about your use pattern. Sounds like a pretty well |
The constructor implemented there is not available from the String class. https://github.com/scala-native/scala-native/blob/fa3f35e1d29849cd4c9bf88ebe9b90494d6d35ad/javalib/src/main/scala/java/lang/String.scala#L107
I think that constructor is a SN invention. I studied both the Java 8 & 17 specifications I tried a trial SN build without that constructor at all. Several places in java.lang I believe this should have a comment saying that it relies upon the caller not This can, and I believe should, be a separate PR. That will cut down on I was focused on this issue & its Off topic grumble: The name
|
I think it comes from Apache Harmony, as you can see it existed there:
I also added
Actually I'm surprised that
I think it should be adopted. I planned to do it later when I would submit the String class related changes, but if you want to work on this before, no problem to me.
To my knowledge and understanding, after having double checked Harmony and Android Luni Apache2 implementations, this is the single one. However, there are some other things that are similar to that in Android Luni, like a The Javadoc says: /**
* Version of getChars without bounds checks, for use by other classes
* within the java.lang package only. The caller is responsible for
* ensuring that start >= 0 && start <= end && end <= count.
*/ But this has not been ported to SN yet.
I think it is related to conflicts between the SN javalib implementation and the "visible" Java API. |
Per request, moved from #2909 So, methods which can shrink the array or (if any) alter the contents of the array before the largest "shared" size need to allocate a new array (and some of those may be doing so already). I think I convinced myself that shared should change from boolean to a count of the size of the array when it was last given to toString(). The starting count could be -1. I am pretty sure that would work in all cases. 0 might work, with some spurious allocations is rare cases, but I did not run that to ground. |
Per request: Moved from #2909 To try to converge my thoughts. I keep coming back to the "wastage" decision point. Current behavior gives us a wastage of "minimum size of an Object allocation (in this case Array itself, not contents)." In a new scheme, a fixed size "allowed wastage" of MSO would give us a large set of the benefits we are seeing with today's explicit + MSO memory budget. What I like about this is that:
Like a ziggurat, better algorithms can be built on top. David, I do not mean to be beating the same drum about "fixed size" wastage and annoying you. I think the key step is figuring out how to determine/measure "better" some trade off between number of allocations (a proxy for execution speed) and total amount of memory used. I suspect that allowing a "wastage" of "MSO + 16" would give an execution speed up for most workloads with unnoticeable or "acceptable" increased required memory used. This would have to be tested. I have also considered wilder algorithms, based on an expected string size of 80 or 132 characters and the buffer growth progression David mapped out. Those are outside of the scope of this PR. |
Per request, moved from #2909 re: data for "wastage" decision Ok, this is the point where you do whatever you need to do to get into a proper trance and ask yourself Doing some math, which I can write out in more detail if needed. First, assuming the current buffer allocation "growth" progression as detailed earlier by David, Considering 64 bit systems, so using an ObjectOverheadAvoided value of 20. That is, Use an "wasted" decision algorithm if (count + 20 + AcceptableWastage < SbArraySize) take no_copy path AcceptableWastage of 0. Using the current allocation progression, all potential strings of <= 54 characters take the no_copy path. All other copy. All potential strings <= 54 take no_copy path Using a modified growth progression algorithm of 16, 37, 58, 79, 100, 121, 142, f(x), The current algorithm goes "16, 24, 37" so this progression skips the, probably minimally useful, I doodled with a couple of other progressions, based on the 20 character ObjectOverheadAvoided. One can also do similar math on a progression of AcceptableWastage values. I stopped at I think this PR can be implemented with little to no "AcceptableWastage". It has such potential David, Folks, what do you think? |
Later and corrected information. Further information & learnings which change the math to be more favorable.
I ran the numbers with both 24 and 32 byte "AvoidedAllocationBenefit" numbers and The lede is: In the 32 byte case all strings < 82 can be no_copy with 0 wastage. If the discussion gets that far, I can give a fuller story. |
Haha. This PR is really far from easy... I realized something new, while investigating potential problems that could be introduced by PR #2909. Although AbstractStringBuffer -> String is implemented in def this(sb: StringBuffer) = {
this()
offset = 0
value = sb.getValue()
count = sb.length()
}
def this(sb: java.lang.StringBuilder) = {
this()
offset = 0
count = sb.length()
value = new Array[Char](count)
sb.getChars(0, count, value, 0)
} Boom! The But even more important, to we want to align behavior of these constructors with Update: the StringBuffer constructor is indeed buggy: #2925 |
Could you please elaborate on the use cases you have in mind, and added value compared to current boolean?
I was not really strict on this, and it's true that my opinion is bit blurred and moving. In the issue I said: Which later evolved to: In this PR I said:
I would say it is one important thing, but another important aspect would be to do different things for reused and non-reused StringBuilders. This is what I mentioned in the issue: Actually, I also reached the following mathematical conclusions (that I had more or less in mind before, but that I forgot to express like this):
Moreover, if the following rules apply:
Then the StringBuilder growth should follow exactly the growth progression algorithm characteristics. However, relying only this calculation to speculate on the "reusability" of the SB, is not strict enough IMO. We should also track if SB.toString has been previously called or not.
Comment: given a reused SB, the first call to SB.toString will consequently lead to a String wastage between 0 and 33%. But all further SB.toString will lead to more compact Strings. Now the last decision to take, in the case SB.toString has not been previously called, is if we also use a fixed size "allowed wastage", aligned with minimum size of Array allocation, as you cleverly suggested. Conclusions:
The story in this PR is already long 😅 |
David, Truth in advertising: The Sheriff's Deputies came this morning and took away my license to count. It was The "overhead" to allocate an Array.empty[Char] (or Int, or Long, I did not check Object) is 16 bytes, not On 64 bit machines (sizeOfPtr + 8) == 16. The satisfies both the 8 bit alignment of GC.none On 32 bit machines, (sizeOfPtr + 8) = 12, to which both GC.none and GC.immex & commix This changes the "wastage" math a bit. The short story is that, with the current progression, I will give the details when I have a block of time. |
Strategy, I believe the "when to waste" decision can be factored out for a short period of time. re: "shared" changing from Boolean to Int (or such). I am saying change the type of the variable, but same idea could be down with I think the discussion to date is that any SB method which shrinks the sb.count As a thought experiment, imagine a StringBuffer which has never had Now imaging a call to sb.toString() when count is 40.
|
Thanks for calculation update, this is super useful.
If I run this thought experiment with the methodology I previously explained, it should do this:
Total number of backing Array allocations = 2, one for the StringBuffer creation, one for the StringCreation. So why |
At the 8000 metre level, I think that any not-dead-wrong algorithm that lets the PR advance and gives Diving down to the 1 meter level. Perhaps my cultural association with that famous Parisian Let me refine my example, if I can. Taking the current growth progression as a given, I believe that setting the length to less that the largest count at The need to know the largest string given out leads me to believe that value needs Am I missing something here? Did I get the 14/54 right? (14/40 would have I'm off to study the "wastage" algorithm in the file in this PR. Then I see if I can write down my suggested alternative. |
I think this method from
I report this in this context, because it is flaw in the infrastructure this PR will need |
re: contentEquals Here are the notes I have in my Sandbox about this issue: // FIXME: We should rely on CharSequence instead of creating a String
def contentEquals(strbuf: StringBuffer): Boolean = {
...
} Sorry for being slow at releasing all the issues I already discovered. If we get this done, it would mean that |
Yes, I also think that at some point we have to adopt one model even if not perfect and move forward.
Ok let's try this new one
I have not double checked that, but I trust you.
If I mentally run this other example with the proposed methodology, it should do this:
This last operation will actually be the one and only one case where it will generate waste. The next time This why I'm motivated in renaming I'll commit my current state as a showcase. |
Having the concrete code of the 'showcase' is helpful.
|
David,
If you concur with the idea, would you like me to write the tests I described? There is some theoretical advantage to having a person other than |
Sure, some additional tests might be useful. Added to my TODO list for this PR.
All (i.e. 80) of them will be shared, in a non-compacted form.
I would analyze it a different way. If we don't share at all, as we do today, we are ending-up for each example with an allocated StringBuilder + an allocated/compacted String. If the StringBuilder is not collected quickly by the GC, and it seems to happen quite often with our current GCs, what we get is higher memory consumption than a non compacted StringBuilder backing Array alone. We also use more CPU for memory allocation + copy.
Small strings will waste. Medium strings will waste. And large strings will waste. The maximum waste is set in stone as 33%. |
If you can do that I would be very grateful. The main reason is that I'll be super busy next week. So having a second pair of hands is going super useful to have this merged quickly. |
Thank you for patiently answering my questions. I think the summary is that once some unit-tests for potentially 0.5.0 is young enough that we have time to iterate & tune. | Generating real numbers is the (only) way to verify if the current methodology improves or not the current situation. I'll sing in that choir!
Now that is the Good News headline. |
OK, so once I am done with the "string immutability" tests, I will clone the code this PR and OK if I do it as a separate PR? I know of no way for me to add a file and make commits to this PR. Because my time also comes in fits & starts for the next two weeks, I would probably do a series of |
This PR is an attempt to fix the issue #2905.
It consists in the two following modifications:
Change test conditions used to disable "sharing"
from
to
Change the String constructor that is used when "sharing" is enabled
from
to
This latter constructor leads to a String wrapping the provided Array, while the former involves a copy of the used characters in the created String.
Thus, the introduced change avoids this additional copy, which is redundant since the
AbstractStringBuilder
is marked as shared, and will thus recycle its internal backingArray
when necessary (see issue #2905 for details).