-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/fs: Add ASCII fastpath for normalization #9365
base: main
Are you sure you want to change the base?
Conversation
@greatroar WDYT? |
Test case and benchmark... how much faster is this than the old stuff, for the various cases, since it's a performance optimisation? |
Yes, benchmarks please. |
Added a benchmark. Comparison with the previous implementation:
The ASCII fastpath seems to be working as intended but there's still a severe regression for the lowercase unicode testcase. |
lib/fs/folding.go
Outdated
for _, r := range s[i:] { | ||
rs.WriteRune(unicode.ToLower(unicode.ToUpper(r))) | ||
func toLower(r rune) rune { | ||
if r <= unicode.MaxASCII { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be optimized another ~50% via this bikeshed: https://stackoverflow.com/a/77689444/1432614
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're dealing with 4 cases:
- ASCII lower -> no-op
- ASCII mixed -> replace uppercase ones
- Unicode lower -> apply NFC
- Unicode mixed -> replace uppercase ones and apply NFC
The linked logic only offers fast ASCII detection but lacks the upper/lower distinction.
Doing things in single pass:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonal idea from looking at the pprof in https://forum.syncthing.net/t/380k-files-24gb-in-directory-results-in-very-slow-sync-8-days/21488/26:
Quite a bit of time is spent on memory management - re-using bytes slices with a pool might help here.
for i := 0; i < len(s); i++ { | ||
c := s[i] | ||
if c > unicode.MaxASCII { | ||
return false, isLower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given unicode_lowercase is the slow bench result with the new implementatin, maybe it would be worth testing keeping that behaviour from the old code. I.e. when encountering non-ascii, keep checking if there's any upper-case chars (and then exit). So you can then take the lower-case-unicode shortcut from before. This will make the unicode-mixed-case slower, but I wouldn't be surprised if not by a lot compared to the gains for unicode-lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the point I don't really understand. golangs strings.Map()
has that shortcut built into it:
https://cs.opensource.google/go/go/+/refs/tags/go1.21.6:src/strings/strings.go;l=489-517
My previous (flawed?) mixedcase unicode benchmark case should have stopped at the very first byte in the ASCII detection. The rest of the code looks pretty much the same to me? Map the runes and if a case change is detected, assign a string builder with the unchanged part and write mapped runes after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely glossed over the usage of strings.Map xD
That's still one or two more loops. If you did the unicode lower detection in our initial loop that does ascii detection, and then not use strings.Map
, we get rid of part of a loop in both the unicode cases.
Now that also makes me wonder if we could wrap isASCII
and toLowerAscii
into a mapping function for strings.Map
for less code branches/complexity, and equal, maybe even slightly better performance :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we are now mixing two classes of optimisations: Improving things about the unicode path, and adding an ascii shortcut. Maybe worth doing that separately for optimal results. Just an idle, optional suggestion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ASCII path iterates over single bytes which is fast but doesn't work as soon as any unicode characters are involved. Iterating over runes is a tad slower but necessary in that case.
So you either end up with two loops or one rune based loop which is slower for ASCII inputs.
I also tried to get rid of lower(upper(r))
as far as possible as there are only 23(!) characters in unicode that satisfy lower(r) != lower(upper(r))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found it, but apparently the type isn't exported by the unicode package which renders the SpecialCase unusable for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive code generation attempt:
import (
"fmt"
"unicode"
)
func main() {
fmt.Println("unicode.SpecialCase{")
for r := rune(0); r <= unicode.MaxRune; r++ {
lower := unicode.ToLower(r)
lowerUpper := unicode.ToLower(unicode.ToUpper(r))
if lower == lowerUpper {
continue
}
hr := fmt.Sprintf("0x%04x", r)
lr := fmt.Sprintf("0x%04x", lowerUpper)
fmt.Printf("\tunicode.CaseRange{%s, %s, d{0, %s - %s, 0}}, // %s\n", hr, hr, lr, hr, string(r))
}
fmt.Println("}")
}
unicode.SpecialCase{
unicode.CaseRange{0x00b5, 0x00b5, d{0, 0x03bc - 0x00b5, 0}}, // µ
unicode.CaseRange{0x0131, 0x0131, d{0, 0x0069 - 0x0131, 0}}, // ı
unicode.CaseRange{0x017f, 0x017f, d{0, 0x0073 - 0x017f, 0}}, // ſ
unicode.CaseRange{0x0345, 0x0345, d{0, 0x03b9 - 0x0345, 0}}, // ͅ
unicode.CaseRange{0x03c2, 0x03c2, d{0, 0x03c3 - 0x03c2, 0}}, // ς
unicode.CaseRange{0x03d0, 0x03d0, d{0, 0x03b2 - 0x03d0, 0}}, // ϐ
unicode.CaseRange{0x03d1, 0x03d1, d{0, 0x03b8 - 0x03d1, 0}}, // ϑ
unicode.CaseRange{0x03d5, 0x03d5, d{0, 0x03c6 - 0x03d5, 0}}, // ϕ
unicode.CaseRange{0x03d6, 0x03d6, d{0, 0x03c0 - 0x03d6, 0}}, // ϖ
unicode.CaseRange{0x03f0, 0x03f0, d{0, 0x03ba - 0x03f0, 0}}, // ϰ
unicode.CaseRange{0x03f1, 0x03f1, d{0, 0x03c1 - 0x03f1, 0}}, // ϱ
unicode.CaseRange{0x03f5, 0x03f5, d{0, 0x03b5 - 0x03f5, 0}}, // ϵ
unicode.CaseRange{0x1c80, 0x1c80, d{0, 0x0432 - 0x1c80, 0}}, // ᲀ
unicode.CaseRange{0x1c81, 0x1c81, d{0, 0x0434 - 0x1c81, 0}}, // ᲁ
unicode.CaseRange{0x1c82, 0x1c82, d{0, 0x043e - 0x1c82, 0}}, // ᲂ
unicode.CaseRange{0x1c83, 0x1c83, d{0, 0x0441 - 0x1c83, 0}}, // ᲃ
unicode.CaseRange{0x1c84, 0x1c84, d{0, 0x0442 - 0x1c84, 0}}, // ᲄ
unicode.CaseRange{0x1c85, 0x1c85, d{0, 0x0442 - 0x1c85, 0}}, // ᲅ
unicode.CaseRange{0x1c86, 0x1c86, d{0, 0x044a - 0x1c86, 0}}, // ᲆ
unicode.CaseRange{0x1c87, 0x1c87, d{0, 0x0463 - 0x1c87, 0}}, // ᲇ
unicode.CaseRange{0x1c88, 0x1c88, d{0, 0xa64b - 0x1c88, 0}}, // ᲈ
unicode.CaseRange{0x1e9b, 0x1e9b, d{0, 0x1e61 - 0x1e9b, 0}}, // ẛ
unicode.CaseRange{0x1fbe, 0x1fbe, d{0, 0x03b9 - 0x1fbe, 0}}, // ι
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the exceptions are that few it certainly seems valid to pre-generate the table and then use that plus regular ToLower in order to save on the more complex operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the unexported d
is a bummer. Wonder what the thinking there is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played a bit with unicode.RangeTable
but couldn't improve the performance:
var complexFolds = &unicode.RangeTable{
R16: []unicode.Range16{
{181, 305, 124},
{383, 837, 454},
{962, 976, 14},
{977, 981, 4},
{982, 1008, 26},
{1009, 1013, 4},
{7296, 7304, 1},
{7835, 8126, 291},
},
R32: []unicode.Range32{},
LatinOffset: 0,
}
func toLower(r rune) rune {
if r <= unicode.MaxASCII {
if r < 'A' || 'Z' < r {
return r
}
return r + 'a' - 'A'
}
if unicode.Is(complexFolds, r) {
return unicode.To(unicode.LowerCase, unicode.To(unicode.UpperCase, r))
}
return unicode.To(unicode.LowerCase, r)
}
Latest numbers:
|
@bt90 Would you like me to check some test build on this very machine, let me know. Eager to see this implemented/released. |
Unfortunately my time is rather limited these days. We moved last week and I still have a lot of work to do around the house. But feel free to test and post your results. |
Purpose
Avoids
unicode.ToLower(unicode.ToUpper(r))
andnorm.NFC.String(s)
for simple ASCII filenames.