Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #2905: enable and tune backing Array sharing in AbstractStringBuilder.toString #2908

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

david-bouyssie
Copy link
Contributor

This PR is an attempt to fix the issue #2905.
It consists in the two following modifications:

Change test conditions used to disable "sharing"

from

if (wasted >= 256 || (wasted >= INITIAL_CAPACITY && wasted >= (count >> 1)))

to

if (wasted >= INITIAL_CAPACITY && wasted >= (count >> 1) + 2)

Change the String constructor that is used when "sharing" is enabled

from

new String(value, 0, count)

to

new String(0, count, value)

This latter constructor leads to a String wrapping the provided Array, while the former involves a copy of the used characters in the created String.
Thus, the introduced change avoids this additional copy, which is redundant since the AbstractStringBuilder is marked as shared, and will thus recycle its internal backing Array when necessary (see issue #2905 for details).

@LeeTibbert
Copy link
Contributor

David, Thank you for the PR and for moving this stone forward.

shared = true
return new String(value, 0, count)
new String(0, count, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my thinking, this is, given a sane "wasted" test, without doubt a win for both
the "one-and-done", without-loss-of-generality, StringBuilder case and the
recommended multi-use case.

Good catch.

I also like the explanatory comments. We will all certainly be back here again, trying to figure the rationale out again.

// than what is added by a single enlargeBuffer() operation
// (it may happen if setLength() has been previously called to shrink the number of used characters)
// => copy backing Array in a fresh String
if (wasted >= INITIAL_CAPACITY && wasted >= (count >> 1) + 2) { // see enlargeBuffer() for details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been studying this line for hours, my limitation, and have been trying to figure
out what data would be needed to make a data driven decision. Sorry, but I need
more time. I do not want my limited understanding to stand in the way of progress.

I suspect that this line is problematic.

The 256 'magic number' in the original code was never explained. By reading of
the original code, the clause to the left of the || prevented "huge" amounts of
'wasted' space from getting passed to a String constructor. That is, it establishes
256 as the maximum amount per String of potentially wasted space of potentially long duration. One can talk about or, better yet, measure the effect of values smaller than 256 (I suspect 128 or 64 would be better).

By my, possibly flawed reading, this line of code will allow "huge" wasted space,
given a certain usage pattern. Consider a 4K 'count' with a, say, 3K 'wasted'.
I think that will go the "shared" path and waste 3K. I know people who would
say 'Ouch' (in French or Anglo-Saxon) to that.

As mentioned, I suspect, but have no data, that a simple if (wasted > MAGIC_NUMBER) would work here, over a wide range of guessed "MAGIC_NUMBER`s, each suitably explained.

It is a minor issue, but I do not see how INITIAL_CAPACITY comes into play?
It is less of a 'magic number' both in mystery and magnitude than 256 but I miss
the logical connection, if any.

As David & I have discussed in a couple of other places, Discourse & Issues here,
the whole 'wasted' discussion is intimately wrapped up in the "grow-the-buffer" algorithm,
especially the small size of the first few (5? or so) allocations. That discussion
should not hold up progress here, but needs to be kept in the background.
I think the "Historical 1.5 grow-the-buffer" algorithm allocates too little at the
beginning and too much as the buffer grows larger. Wild guesses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ each suitably explained./the finally chosen one suitably explained/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LeeTibbert for your review.
I'll try to address the different comments/questions one by one.

The 256 'magic number' in the original code was never explained.

I'm not a big fan of it neither, as it makes the code a bit cryptic.
However, after a deeper analysis, we may find a good reason for keeping or modifying this value.

I suspect that this line is problematic.
By my, possibly flawed reading, this line of code will allow "huge" wasted space,
given a certain usage pattern. Consider a 4K 'count' with a, say, 3K 'wasted'.
I think that will go the "shared" path and waste 3K.

I may be wrong but I don't think it will lead to a wasted backing Array ("shared" path) with the data provided in your example.
I summarized your data in a spreadsheet, where I computed additional values as follows:

Variable Value
count 4000
wasted 3000
length 7000
wasted_limit 2002
wasted > wasted_limit ("sharing" disabled) true

If I got your example, and since val wasted = value.length - count, the backing Array length should 7K, but this is not needed for further calculations.
And the resulting wasted_limit would be estimated as (count / 2) + 2 which equals here 2002.
Thus the sharing test will fail, and the used chars will be copied in the created String.

It is a minor issue, but I do not see how INITIAL_CAPACITY comes into play?
It is less of a 'magic number' both in mystery and magnitude than 256 but I miss
the logical connection, if any.

The idea is that if the waste is very small (lower than 16), we thus always enable sharing to avoid additional allocations.
But you are right, it's still a magic value and it could be questioned.
The goal behind this PR, is to set the number of new Array()/new String() to the bare minimum, without affecting too much the wasted memory in the underlying backing Array. The rationale, is that less allocations means less garbage and thus less work for the GC.
The best tradeoff between less allocations and less waste in the backing Array might be tricky to find. The current PR proposes one possible model. Also important to note: what makes me a bit confident about this model, is that it has been adopted in Harmony and Android reference implementations.

the whole 'wasted' discussion is intimately wrapped up in the "grow-the-buffer" algorithm,
especially the small size of the first few (5? or so) allocations. That discussion
should not hold up progress here, but needs to be kept in the background.
I think the "Historical 1.5 grow-the-buffer" algorithm allocates too little at the
beginning and too much as the buffer grows larger. Wild guesses.

Totally right, there might be room for improvement regarding the "grow-the-buffer" algorithm.
But this can be treated independently IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this can be treated independently IMO
Agreed

Thank you for the explanations.

Discussion, not to block this PR or burden you.
If it does not give away proprietary details, may I ask:

  • It sounds like you have tried this modification in your work.
    What did you see?

  • Do you reuse a StringBuilder or is it single use?

  • Do you add a lot of small strings to that SB or a few large (>= 1K) strings.

  • Any idea of the expected lifetime of the child Strings? Minutes, hours, days?

If I were to take a run at this, which I can not do in the near future,
I think I would first instrument a base load, and then run one of the
the Scala benchmarks against it. I do not know if there is a String
intensive one.

I think I would try to set the lower acceptable bound higher (say 32)
to try to minimize allocations, and then the higher bound much lower,
maybe (>>2 + mumble). The hope would be that, amortized, the
memory saved with "moderate" and "large" Strings would pay back
the memory spent on small strings.

Problem is that all of this almost certainly depends on the workload and
there does not appear to me to be an economic way of automatically
adjusting the workload ( recent past behavior is single best predictor of future behavior.
Perhaps maintaining a running "max" and extending by some factor of that).
As you say, a problem for another issue.

  • Seems like being willing to "waste" 2001 bytes is a gift to the people
    who manufacture RAM.

    By my reading, current code will "waste" a maximum of 256 bytes per
    String allocation.

@LeeTibbert
Copy link
Contributor

LGTM - Looks good to me.

Generations of degree students will be studying & tuning that one line of "use String constructor which will copy" code.
That should not hold up acceptance of this PR.

@david-bouyssie
Copy link
Contributor Author

Thank you @LeeTibbert for your time and your accurate review.

First the questions:

It sounds like you have tried this modification in your work. What did you see?

At first, I was thinking to try/benchmark all my recent String related findings and only starting sharing ideas/code, after a careful set of benchmarks execution. Then I realized that it may introduce long delays (days/weeks?), while things could be discussed and improved in a more atomic and interactive fashion.
So to answer clearly your question, I need more time to give proper numbers.

Do you reuse a StringBuilder or is it single use?

Both, see my reply in the related issue (#2905) regarding String interpolation and write to File examples. Usually I even combine both in the same program.

Do you add a lot of small strings to that SB or a few large (>= 1K) strings.

Regarding the first app that comes to my mind (a CSV writer), I usually add small strings to the SB (parts composing a line).

Any idea of the expected lifetime of the child Strings? Minutes, hours, days?

Ideally I would like to avoid Strings creation for the recycled SB (see: #2906).
However for the String interpolation use case, sadly the String path can't be avoided.
Another optimization to avoid String creation could come from solving #2902.
This would allow to replace StringBuilder.append(d: Double):
from

  def append(d: scala.Double): StringBuilder = {
    append0(Double.toString(d))
    this
  }

to

  def append(d: scala.Double): StringBuilder = {
    this.ensureCapacity(this.count + _RyuDouble.RESULT_STRING_MAX_LENGTH)
    this.count = _RyuDouble.doubleToChars(d, RyuRoundingMode.Conservative, value, this.count)
    this
  }

Thank you for your advices regarding the benchmarking procedure. This is going to be be super useful.

Also, totally agree: Problem is that all of this almost certainly depends on the workload and there does not appear to me to be an economic way of automatically adjusting the workload

Seems like being willing to "waste" 2001 bytes is a gift to the people who manufacture RAM.

Yes you are right. But at the same time, how many SB are currently copied to a String while it's not necessary?
As the proverb says, The devil is in the details, and it's hard to anticipate all the possible combinations.
Maybe the current Java API is guilty, as it doesn't provide the appropriate level of configurability to control the String wasting/duplication ratio from a user perspective.
Also please, refer to my point in the related issue regarding possible conceivable workarounds.

By my reading, current code will "waste" a maximum of 256 bytes per String allocation.

By current code, I guess you mean in the SN 0.4/main branches. If it's the case I would say that it never waste chars inside the String as a fresh String with Array copy is always created at the moment.

@LeeTibbert
Copy link
Contributor

| The devil is in the details

I thought the devil was in Florida, on holiday...

Thank you for the information about your use pattern. Sounds like a pretty well
distributed set of potentially problematic cases. I like you idea of solving the
problems one a a time (defeating in detail) and using your existing workloads
to see if the result is as you/we intended. Any major regression should show
up pretty quickly.

@david-bouyssie david-bouyssie marked this pull request as draft October 17, 2022 15:03
@LeeTibbert
Copy link
Contributor

LeeTibbert commented Oct 17, 2022

I think that constructor is a SN invention. I studied both the Java 8 & 17 specifications
and could not find it. It is somewhat anomalous since its Array is the last/rightmost arg.
The documented constructors seem to all have the Array as the first/leftmost arg.
private[lang] def this(start: Int, length: Int, data: Array[Char]) = {

I tried a trial SN build without that constructor at all. Several places in java.lang
use it. I held my nose, marked it private[lang] and did not venture deeper into
the swamp.

I believe this should have a comment saying that it relies upon the caller not
altering the contents of the array (i.e. contract). That is, that it is unsafe, but
not in the scalanative.unsafe sense.

This can, and I believe should, be a separate PR. That will cut down on
the number of pieces in motion at once. If you like, I can submit such a PR.
It should not hold up or block any of our other discussions. Please advise.

I was focused on this issue & its append0 cousin, so I did not check if
any of the other 20-ish constructors need to be private also.

Off topic grumble: The name _String for a class name is somewhat suspect or,
at the least, hard to trace (a.ka. booby trap). I think it has to do
with String meaning the Scala String from predef. There is
currently a move to remove the automatic inclusion of predef
so that javalib does not depend on scalalib. Be ready to call
the fire department when that change hits _String.

             No sense trying to change too much in _String until
             well after the `no predef` change hits.  I think the
             change above to change and document the
             SN private constructor is worthwhile.

@david-bouyssie
Copy link
Contributor Author

I think that constructor is a SN invention.

I think it comes from Apache Harmony, as you can see it existed there:
https://github.com/apache/harmony/blob/02970cb7227a335edd2c8457ebdde0195a735733/classlib/modules/luni/src/main/java/java/lang/String.java#L395

I tried a trial SN build without that constructor at all. Several places in java.lang use it.
I held my nose, marked it private[lang] and did not venture deeper into the swamp.

I also added private[lang] inside my current/local String rework WIP.

I believe this should have a comment saying that it relies upon the caller not altering the contents of the array (i.e. contract).

Actually I'm surprised that javalib is almost Scaladoc/comment free. Is it on purpose?
I don't get why Apache Harmony documentation and comments were not preserved...
I'll ask on Discord.

This can, and I believe should, be a separate PR. That will cut down on
the number of pieces in motion at once. If you like, I can submit such a PR.
It should not hold up or block any of our other discussions. Please advise.

I think it should be adopted. I planned to do it later when I would submit the String class related changes, but if you want to work on this before, no problem to me.

I was focused on this issue & its append0 cousin, so I did not check if any of the other 20-ish constructors need to be private also.

To my knowledge and understanding, after having double checked Harmony and Android Luni Apache2 implementations, this is the single one.

However, there are some other things that are similar to that in Android Luni, like a getCharsUnchecked() alternative method to getChars() :
https://android.googlesource.com/platform/libcore2/+/master/luni/src/main/java/java/lang/String.java#896

The Javadoc says:

    /**
     * Version of getChars without bounds checks, for use by other classes
     * within the java.lang package only.  The caller is responsible for
     * ensuring that start >= 0 && start <= end && end <= count.
     */

But this has not been ported to SN yet.

Off topic grumble: The name _String for a class name is somewhat suspect or, at the least, hard to trace (a.ka. booby trap). I think it has to do with String meaning the Scala String from predef. There is currently a move to remove the automatic inclusion of predef so that javalib does not depend on scalalib. Be ready to call the fire department when that change hits _String.

I think it is related to conflicts between the SN javalib implementation and the "visible" Java API.
I remember a discussion about internal aliasing of javalib classes to match the Java API, but I don't want to say mistakes.
I think @densh 's input would be very valuable there.

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Oct 19, 2022

Per request, moved from #2909

So, methods which can shrink the array or (if any) alter the contents of the array before the largest "shared" size need to allocate a new array (and some of those may be doing so already).

I think I convinced myself that shared should change from boolean to a count of the size of the array when it was last given to toString(). The starting count could be -1. I am pretty sure that would work in all cases. 0 might work, with some spurious allocations is rare cases, but I did not run that to ground.

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Oct 19, 2022

Per request: Moved from #2909

To try to converge my thoughts. I keep coming back to the "wastage" decision point.
I do not know the allocation size of the minimum Object. I think it is something like 16 bytes on 64 bit systems.

Current behavior gives us a wastage of "minimum size of an Object allocation (in this case Array itself, not contents)."
For the sake of discussion lets call that MSO (a.k.a 16).

In a new scheme, a fixed size "allowed wastage" of MSO would give us a large set of the benefits we are seeing with today's explicit + MSO memory budget.

What I like about this is that:

  1. it can be implemented today (well, quickly)
  2. it is a single point where better algorithms can be tried in private
  3. it maintains today's behavior of huge objects not becoming extra huge.

Like a ziggurat, better algorithms can be built on top.

David, I do not mean to be beating the same drum about "fixed size" wastage and annoying you.
I know that you have heard that approach before and declined it.

I think the key step is figuring out how to determine/measure "better" some trade off between number of allocations (a proxy for execution speed) and total amount of memory used.

I suspect that allowing a "wastage" of "MSO + 16" would give an execution speed up for most workloads with unnoticeable or "acceptable" increased required memory used. This would have to be tested.

I have also considered wilder algorithms, based on an expected string size of 80 or 132 characters and the buffer growth progression David mapped out. Those are outside of the scope of this PR.

@LeeTibbert
Copy link
Contributor

Per request, moved from #2909

re: data for "wastage" decision

Ok, this is the point where you do whatever you need to do to get into a proper trance and ask yourself
"What was he on?"

Doing some math, which I can write out in more detail if needed.

First, assuming the current buffer allocation "growth" progression as detailed earlier by David,
16, 24, 36, 54, 81, 122, and so on... I know that we have been trying to avoid discussion
of that algorithm, but it directly affects the "wastage" decision.

Considering 64 bit systems, so using an ObjectOverheadAvoided value of 20. That is,
an allocation of an empty array would have taken 20 bytes, so we can 'spend' up to 20
bytes and some out ahead or even.

Use an "wasted" decision algorithm

if (count + 20 + AcceptableWastage < SbArraySize) take no_copy path
else use StringConstuctorWhichCopies

AcceptableWastage of 0.

Using the current allocation progression, all potential strings of <= 54 characters take the no_copy path. All other copy.
AcceptableWastage of 20.

All potential strings <= 54 take no_copy path
Potential strings 55 to 80 (inclusive) have a maximum "true wastage" of 6.
Strings 81 to 122 have a maximum "true wastage" of 20 (at 81 characters).
If one can change the world

Using a modified growth progression algorithm of 16, 37, 58, 79, 100, 121, 142, f(x),
and an AcceptableWastage of 0, all potential strings <= 142 are no_copy all others copy.
That covers both short strings and common 80 & 132 character strings. It has no wastage
for large strings.

The current algorithm goes "16, 24, 37" so this progression skips the, probably minimally useful,
allocation & copy at 24. This algorithm is a little less aggressive at growing in the range above
79.

I doodled with a couple of other progressions, based on the 20 character ObjectOverheadAvoided.
The current one seems to work. A small program could be written to try to come up with a
better one or a NoPrize could be offered in an undergraduate competition.

One can also do similar math on a progression of AcceptableWastage values. I stopped at
Zero because it seemed like I had hit enough of a sweet spot to document here.
Summary

I think this PR can be implemented with little to no "AcceptableWastage". It has such potential
in both the single-use and multiple-use cases that it can be earning its keep whilst better
growth & "AcceptableWastage" algorithms are explored.

David, Folks, what do you think?

@LeeTibbert
Copy link
Contributor

Later and corrected information.

Further information & learnings which change the math to be more favorable.
This discusses 64 bit systems. The numbers are probably smaller on 32 bit
systems, but the concept remains.

  • The minimum obvious size of an Array.empty[Char] on 64 bit systems is 18 bytes.

  • Densh pointed out that the above allocation gets aligned by the current GC.

    • The none GC appears to align to 8 byte boundaries, giving a minimum of 24 bytes.
    • The Immix & Commix, in their "standard" configurations, align to 16 byte boundaries,
      giving a minimum allocation of 32 bytes.

I ran the numbers with both 24 and 32 byte "AvoidedAllocationBenefit" numbers and
the current "grow by times 1.5 current" growth algorithm.

The lede is: In the 32 byte case all strings < 82 can be no_copy with 0 wastage.

If the discussion gets that far, I can give a fuller story.

@david-bouyssie
Copy link
Contributor Author

david-bouyssie commented Oct 20, 2022

Haha. This PR is really far from easy...

I realized something new, while investigating potential problems that could be introduced by PR #2909.

Although AbstractStringBuffer -> String is implemented in AbstractStringBuffer.toString, there are other implementations inside the String class, in its constructors to be precise.
The second problem is that those String constructors, accepting either a StringBuilder or StringBuffer argument, are inconsistent in terms of behavior, as you can see in the following code excerpt:

  def this(sb: StringBuffer) = {
    this()
    offset = 0
    value = sb.getValue()
    count = sb.length()
  }

  def this(sb: java.lang.StringBuilder) = {
    this()
    offset = 0
    count = sb.length()
    value = new Array[Char](count)
    sb.getChars(0, count, value, 0)
  }

Boom!

The StringBuffer ones looks to me like a bug, since to calls getValue() instead of shareValue().
BTW, getValue()/shareValue() would benefit from being private[lang].
And not sure if getValue() is really useful, shareValue() might be enough and safer, even for internal usage.

But even more important, to we want to align behavior of these constructors with AbstractStringBuffer.toString?
Tricky...


Update: the StringBuffer constructor is indeed buggy: #2925

@david-bouyssie
Copy link
Contributor Author

david-bouyssie commented Oct 20, 2022

I think I convinced myself that shared should change from boolean to a count of the size of the array when it was last given to toString().

Could you please elaborate on the use cases you have in mind, and added value compared to current boolean?

David, I do not mean to be beating the same drum about "fixed size" wastage and annoying you.
I know that you have heard that approach before and declined it.

I was not really strict on this, and it's true that my opinion is bit blurred and moving.
As a recap:

In the issue I said:
I don't get how this constant (256) has been chosen, I guess there is a good justification for it.
...
I don't see the point of the wasted >= 256 test condition.

Which later evolved to:
I think the difficulty would be to establish for the end-user, what a reasonable fixed "waste" could be. Do we want (or can we) make this configurable?
...
Even if it is opaque/cryptic, I would be in favor of keeping 256, because it is aligned with historical and inherited source code.

In this PR I said:
I'm not a big fan of it neither, as it makes the code a bit cryptic.
However, after a deeper analysis, we may find a good reason for keeping or modifying this value.

...
The idea is that if the waste is very small (lower than 16), we thus always enable sharing to avoid additional allocations.
But you are right, it's still a magic value and it could be questioned.

I think the key step is figuring out how to determine/measure "better" some trade off between number of allocations (a proxy for execution speed) and total amount of memory used.

I would say it is one important thing, but another important aspect would be to do different things for reused and non-reused StringBuilders. This is what I mentioned in the issue:
I thought about a possible alternative to track a "reused" SB: we could flag the SB as "reused" after the first toString call. This basic rule would be: if an SB that has been already converted to a String, it should then try to generate less "waste" during further conversions.

Actually, I also reached the following mathematical conclusions (that I had more or less in mind before, but that I forgot to express like this):
Since val wasted = value.length - count is equivalent to val count = value.length - wasted, the following calculations should be valid:

wasted >= (count >> 1)
wasted >= count / 2
wasted >= (value.length - wasted) / 2
2 * wasted >= (value.length - wasted)
3 * wasted >= value.length
wasted >= value.length / 3
wasted >= 0.333333 * value.length

Moreover, if the following rules apply:

  • the StringBuilder has not been created with a capacity different than in its initial one
  • setLength() has not not been called to manually enlarge the StringBuilder
  • no deletion operation has been performed

Then the StringBuilder growth should follow exactly the growth progression algorithm characteristics.
And this will be usually the case for the "one-and-done" pattern, which is well described in the issue #2905, and thus can be used to infer the use case nature of the StringBuilder .
Even if not documented, I have the strong feeling this was a major code authors' reason to rely on this condition (aligned with growth progression) for the sharing decision, in the inherited Apache Harmony project (see https://github.com/apache/harmony/blob/trunk/classlib/modules/luni/src/main/java/java/lang/AbstractStringBuilder.java#L624.
If it's not the case, it's really fortunate...

However, relying only this calculation to speculate on the "reusability" of the SB, is not strict enough IMO. We should also track if SB.toString has been previously called or not.
To summarize:

  • The 33% wastage is acceptable only if SB.toString has not been previously called.
  • If SB.toString has been previously called (reused SB), I think we must be very strict regarding the level of wastage, and we can disable sharing (Array copy)

Comment: given a reused SB, the first call to SB.toString will consequently lead to a String wastage between 0 and 33%. But all further SB.toString will lead to more compact Strings.

Now the last decision to take, in the case SB.toString has not been previously called, is if we also use a fixed size "allowed wastage", aligned with minimum size of Array allocation, as you cleverly suggested.
It is important to note that when the selected fixed size will be lower than 33% of the backing Array length (large Strings), it will force "Array copy" (before that threshold, the two conditions will agree), this 33% spare/total ratio being currently imposed by the growth algorithm (if rules presented above apply).

Conclusions:

  • I think relying on "SB.toString previously called" is a good trick.
  • Using or not a fixed size "wastage" limit has corner cases I think, and I'm not totally sure about it.

If the discussion gets that far, I can give a fuller story.

The story in this PR is already long 😅
I would prefer to have a factual decision (mainly because I don't know how to implement In the 32 byte case all strings < 82 can be no_copy with 0 wastage) , although describing the rationale will be also needed.
Thank you for your detailed analysis BTW.

@LeeTibbert
Copy link
Contributor

David,
Thank you for your detailed message above. I have only a few moments now but will reply in
detail as when I can. It may have to be Fri, Oct21.

Truth in advertising:

The Sheriff's Deputies came this morning and took away my license to count. It was
a shameful sight. To wit:

The "overhead" to allocate an Array.empty[Char] (or Int, or Long, I did not check Object) is 16 bytes, not
the previously reported wrong values. The 16 holds for both 64 and 32 bit machines.

On 64 bit machines (sizeOfPtr + 8) == 16. The satisfies both the 8 bit alignment of GC.none
and the 16 bit alignment of GC.immix & commix. No additional alignment bytes needed.

On 32 bit machines, (sizeOfPtr + 8) = 12, to which both GC.none and GC.immex & commix
add 4 bytes to align, so 16.

This changes the "wastage" math a bit. The short story is that, with the current progression,
most "small to medium" strings will take up no (overall) storage than they currently do.
There is an "island" somewhere between 55 and 60is (inexact) where one is faced with
a "copy and have 0 wastage" or "allow some wastage (5?) for these few so they do not copy".
All strings above 120ish copy (hence have zero wastage).

I will give the details when I have a block of time.

@LeeTibbert
Copy link
Contributor

Strategy, I believe the "when to waste" decision can be factored out for a short period of time.
Any one of a number would do whilst other changes required by "sharing" are implemented.
This reduces complexity and allows testing whilst considering complex algorithms in detail.

re: "shared" changing from Boolean to Int (or such).

I am saying change the type of the variable, but same idea could be down with
two variables. I prefer fewer pieces in motion.

I think the discussion to date is that any SB method which shrinks the sb.count
below the largest size given out by a previous toString() call, if any, needs
to copy the SB.array.

As a thought experiment, imagine a StringBuffer which has never had toString() called.
Assume that 'capacity' is sufficient in all cases, say 80. With a count of, say 40, a nonesense
sequence of calls setLength(70); setLength(60) is valid.

Now imaging a call to sb.toString() when count is 40. setLength(70) should not force
new Array, given the new 70, setLength(38) _has_ to copy; setLength(60) does not it
knows that the no String greater than that length has been given out.

  • `trimToSize() has a similar consideration. Trimming to larger than last string give out need not copy.

  • replace() also need not copy if the item being replaced is greater than the last string.

  • reverse() must always copy

@david-bouyssie
Copy link
Contributor Author

david-bouyssie commented Oct 20, 2022

The "overhead" to allocate an Array.empty[Char] (or Int, or Long, I did not check Object) is 16 bytes, not
the previously reported wrong values. The 16 holds for both 64 and 32 bit machines.

Thanks for calculation update, this is super useful.

As a thought experiment, imagine a StringBuffer which has never had toString() called.
Assume that 'capacity' is sufficient in all cases, say 80. With a count of, say 40, a nonesense
sequence of calls setLength(70); setLength(60) is valid.
Now imaging a call to sb.toString() when count is 40. setLength(70) should not force
new Array, given the new 70, setLength(38) has to copy; setLength(60) does not it
knows that the no String greater than that length has been given out.

If I run this thought experiment with the methodology I previously explained, it should do this:

  • sb.toString() when count is 40. It means a wastage of 40 / 80 = 50%. This is above the 33% limit, the backing Array is not shared, and a fresh copy is put in the created String
  • sb.setLength(70) -> Arrays.fill(value, count, 70, '\u0000') // erase content to expand count
  • sb.setLength(38) -> count = 38 // decrease count
  • setLength(60) -> Arrays.fill(value, count, 60, '\u0000') // erase content to expand count

Total number of backing Array allocations = 2, one for the StringBuffer creation, one for the StringCreation.
Length of SB backing Array = 80
Length of String backing Array = 40

So why sb.setLength(38) would require to always copy?

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Oct 20, 2022

At the 8000 metre level, I think that any not-dead-wrong algorithm that lets the PR advance and gives
some net yield for the effort invested is OK. You seem confident in "yours", so run with it. It is good
to get a first generation "reasonably right", especially if some benefit can be measured or shown.
Trying to get it "almost perfect" usually either kills the PR or the author's enthusiasm for it.

Diving down to the 1 meter level. Perhaps my cultural association with that famous Parisian
Ben Franklin of "Waste not; want not!" fame makes me skittish about wastage.

Let me refine my example, if I can. Taking the current growth progression as a given,
a 40 character String candidate, starting from 0, would have caused the current SB Array to have
a size of 54. That gives a difference of 14. 14/54 is 0.26 which
is less that the 0.33 cut point, so the created string would be no_copy.
Now setLength(53), then setLength(38).

I believe that setting the length to less that the largest count at toString() time will trigger the
need for a copy of the SB Array. (The 14 bytes in the "wastage" at the at the first (only)
toString() time is less than the 16 bytes that would have been allocated
if the created String had to allocate its own array and copy Chars into it.)

The need to know the largest string given out leads me to believe that value needs
to be tracked just in case setLength or its cronies needs it.

Am I missing something here? Did I get the 14/54 right? (14/40 would have
caused a string copy). The numbers in the example might go better with a
starting N of 41 (13/54 == 0.24, 13/41 = 0.32).

I'm off to study the "wastage" algorithm in the file in this PR.
I will also refresh my memory of setLength() as implemented there.

Then I see if I can write down my suggested alternative.

@LeeTibbert
Copy link
Contributor

I think this method from String.scala also hits our checklist.
I had originally been concerned with the scary looking non-JVM sb.getValue()
when I saw the sb.length. I think this can be re-worked to eliminate both
and not introduce too much runtime overhead.

 def contentEquals(sb: StringBuffer): scala.Boolean = {
    val size = sb.length()
    if (count != size) {
      false
    } else {
      regionMatches(0, new _String(0, size, sb.getValue()), 0, size)
    }
  }

I report this in this context, because it is flaw in the infrastructure this PR will need
working. After brief consideration here, I can spin it off to another Issue & corresponding
PR.

@david-bouyssie
Copy link
Contributor Author

re: contentEquals

Here are the notes I have in my Sandbox about this issue:

 // FIXME: We should rely on CharSequence instead of creating a String
 def contentEquals(strbuf: StringBuffer): Boolean = {
  ...
  }

Sorry for being slow at releasing all the issues I already discovered.
I realize again it doubles work/attentation and may create confusion.
At the same time, it is satisfying to see convergence without prior consultation.

If we get this done, it would mean that AbstractStringBuilder.getValue() could be switched to protected, and this will represent also a nice step forward, in the quest for robustness.

@david-bouyssie
Copy link
Contributor Author

At the 8000 metre level, I think that any not-dead-wrong algorithm that lets the PR advance and gives
some net yield for the effort invested is OK. You seem confident in "yours", so run with it. It is good
to get a first generation "reasonably right", especially if some benefit can be measured or shown.
Trying to get it "almost perfect" usually either kills the PR or the author's enthusiasm for it.

Yes, I also think that at some point we have to adopt one model even if not perfect and move forward.
I'm indeed pretty confident in the last proposed methodology because:

  1. it looks like a big win compared to the current implemented solution, this should largely decrease the number of allocations, because IMO, the "one-and-done" scenario is used quite a lot (cf the previous discussion about String interpolation).
  2. it should only have undesired cost when the algorithm failed to speculate on the kind of StringBuilder (reused vs single use). And this failure will only impact the first String created when calling .toString. All subsequent created strings will not waste.
  3. Usually, a reused StringBuilder will be created with a capacity larger than the String it will contain. This should lead to wastage levels > 33% in some scenarios, and thus will also avoid wastage for the first created String. This last point won't be always verified but would mitigate the already very limited wastage discussed in the previous point.

Diving down to the 1 meter level. Perhaps my cultural association with that famous Parisian
Ben Franklin of "Waste not; want not!" fame makes me skittish about wastage.
Let me refine my example, if I can.

Ok let's try this new one

Taking the current growth progression as a given, a 40 character String candidate, starting from 0, would have caused the current SB Array to have a size of 54.

I have not double checked that, but I trust you.

That gives a difference of 14. 14/54 is 0.26 which is less that the 0.33 cut point,

If I mentally run this other example with the proposed methodology, it should do this:

  • sb.toString() when count is length is 54 and count is 40. It means a wastage of 14 / 54 = 26%. This is below the 33% limit, the backing Array will be shared to create the String.
  • sb.setLength(53) -> Arrays.fill(value, count, 53, '\u0000') // erase content between 40 and 53 to expand count
  • sb.setLength(38) -> backing Array is shared, AbstractStringBuilder is thus replaced by a new one having same length than current one and a count of 38 (only chars in that range are copied).

This last operation will actually be the one and only one case where it will generate waste. The next time .toString is called, it will never be shared again, and will thus never waste again.

This why I'm motivated in renaming shareValue() to shareValueOnce(), and to put the previously discussed sharing logic (the one being currently in def toString) there.

I'll commit my current state as a showcase.

@LeeTibbert
Copy link
Contributor

Having the concrete code of the 'showcase' is helpful.
You have put a lot of thought and effort into the code and
the benefit shows.

  1. In studying that code, and taking at as it is, I think that some tests
    in probably just StringBuilder to give out a moderate size string
    and then exercise, in separate tests, the various AbstractStringBuilder
    methods, insert, replace, delete, trimToSize, ???, which might
    affect shared characters.

    I believe one can not "test-in" quality but one can at least try obvious
    cases at least once.

  2. Another thought experiment, if you are patient enough.
    I have, I must confess, lost the track. More importantly,
    reviewers are going to be looking for a concise summary or cost/benefit statement.

    Given a scenario with N from 1 to, say, 80, where a StringBuilder is created
    with N characters and toString() is immediately called on it, how many
    of those strings are expected to be shared?

    If we ignore "wasteage" less that the 16 avoided by a "shared" String
    not having to allocate a new Array, how much total and say, average
    wasteage is expected.

    We had worked through some example figuring out the "wasteage" of
    large, say 4K, strings. Is it still, or was it ever, true that the can be
    rather large chunks of "wasteage" there?

@LeeTibbert
Copy link
Contributor

David,

I think that some tests in probably just StringBuilder

If you concur with the idea, would you like me to write the tests I described?
StringBuilderTest if such exists, is not part of this PR, so there would be no
concurrent file modification issues. I do not want to be that person who
is always suggesting work for others. Then again, if you have a plan, let
me know what I can do or not do. Including standing by or down.

There is some theoretical advantage to having a person other than
the person changing the write the tests.

@david-bouyssie
Copy link
Contributor Author

In studying that code, and taking at as it is, I think that some tests in probably just StringBuilder to give out a moderate size string and then exercise, in separate tests, the various AbstractStringBuilder methods, insert, replace, delete, trimToSize, ???, which might affect shared characters.
I believe one can not "test-in" quality but one can at least try obvious cases at least once.

Sure, some additional tests might be useful. Added to my TODO list for this PR.

Another thought experiment, if you are patient enough.
I have, I must confess, lost the track. More importantly,
reviewers are going to be looking for a concise summary or cost/benefit statement.
Given a scenario with N from 1 to, say, 80, where a StringBuilder is created
with N characters and toString() is immediately called on it, how many
of those strings are expected to be shared?

All (i.e. 80) of them will be shared, in a non-compacted form.
But compared to the current situation, I anticipate we should observe a total memory consumption decrease.

If we ignore "wasteage" less that the 16 avoided by a "shared" String
not having to allocate a new Array, how much total and say, average
wasteage is expected.

I would analyze it a different way. If we don't share at all, as we do today, we are ending-up for each example with an allocated StringBuilder + an allocated/compacted String. If the StringBuilder is not collected quickly by the GC, and it seems to happen quite often with our current GCs, what we get is higher memory consumption than a non compacted StringBuilder backing Array alone. We also use more CPU for memory allocation + copy.
I would say that the benefit of the proposed methodology is directly linked to the characteristics of the current GCs. With a totally different GC, another tuning might be more adapted.

We had worked through some example figuring out the "wasteage" of
large, say 4K, strings. Is it still, or was it ever, true that the can be
rather large chunks of "wasteage" there?

Small strings will waste. Medium strings will waste. And large strings will waste. The maximum waste is set in stone as 33%.
I understand that you focus on this quantity, but what I'm trying to stress, is that probability of wasting is maybe more important than wastage quantity.
Until now, I think we had exclusively theoretical considerations regarding the possible cost/benefit ratio of the discussed methodologies. However, I think we need real life tests, to verify how SN (and especially the GC) will deal with that.
Generating real numbers is the (only) way to verify if the current methodology improves or not the current situation.

@david-bouyssie
Copy link
Contributor Author

If you concur with the idea, would you like me to write the tests I described?
StringBuilderTest if such exists, is not part of this PR, so there would be no
concurrent file modification issues. I do not want to be that person who
is always suggesting work for others. Then again, if you have a plan, let
me know what I can do or not do. Including standing by or down.
There is some theoretical advantage to having a person other than
the person changing the write the tests.

If you can do that I would be very grateful. The main reason is that I'll be super busy next week. So having a second pair of hands is going super useful to have this merged quickly.

@LeeTibbert
Copy link
Contributor

Thank you for patiently answering my questions.

I think the summary is that once some unit-tests for potentially
mutated strings are written, this PR should proceed to at least
CI. That would give a baseline that you, I, and interested people
could clone & run, gaining real world experience.

0.5.0 is young enough that we have time to iterate & tune.

| Generating real numbers is the (only) way to verify if the current methodology improves or not the current situation.

I'll sing in that choir!

All (i.e. 80) of them will be shared, in a non-compacted form.
But compared to the current situation, I anticipate we should observe a total memory consumption decrease.

Now that is the Good News headline.
When` we prove the latter part, we can scream it from the rooftops. Lee, all thing in time.

@LeeTibbert
Copy link
Contributor

f you can do that I would be very grateful.

OK, so once I am done with the "string immutability" tests, I will clone the code this PR and
write some StringBuilder tests to exercise the methods which could possibly change shared strings.

OK if I do it as a separate PR? I know of no way for me to add a file and make commits to this PR.
If you prefer another way to co-ordinate, let me know.

Because my time also comes in fits & starts for the next two weeks, I would probably do a series of
commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants