[WIP] Accelerate string matching #159

Andersama · 2020-12-28T15:31:56Z

Optimizes string matching by allowing memcmp like functionality (even on utf8 sequences)

comparison: https://compiler-explorer.com/z/Tz3KhG

Rough idea is that ordinarily per character we're doing two comparisons, one to check iterators and one to validate the character and then a branch. Presumably we can improve on this by reducing comparisons and jumps if we know enough about the iterator to work out the remaining characters left, then some trickery to do comparisons in as branchless manner as possible. Vaguely implements idea off of #147, just for string<> at the moment.

Appears clang and msvc pick up on this change and optimize with sse, presumably gcc might as well.

The fold expression responsible for validating the match can definitely be played around with.
We could count the characters matched, count the differences, or like here accumulate bitwise differences in a register.

	return { current + bump, (count >= sizeof...(String)) && (((current[Idx] ^ static_cast<std::remove_cvref_t<char_type>>(String)) | ... | static_cast<char_type>(0)) == 0) };

… on utf8 sequences) reference: hanickadot#147 comparison: https://compiler-explorer.com/z/Tz3KhG

Andersama · 2020-12-30T10:55:16Z

Not bench-marked things, but been playing around since the tests passed. It seems like combining == and & operators generates the same assembly that clang uses for std::memcmp.

Andersama · 2020-12-31T03:21:20Z

I'm getting pretty significant improvements to matching / searching from this. When using == and & operators for example against the pattern 0123456789ABCDEF turns into a 16 byte comparison sse instruction.
https://compiler-explorer.com/z/1bzKKd

match_str(char*, char*):                       # @match_str(char*, char*)
        mov     rax, rsi
        sub     rax, rdi
        cmp     rax, 16
        jb      .LBB0_1
        movdqu  xmm0, xmmword ptr [rdi]
        pcmpeqb xmm0, xmmword ptr [rip + .LCPI0_0]
        pmovmskb        eax, xmm0
        cmp     ax, -1
        sete    cl
        add     rdi, 16
        cmp     rdi, rsi
        sete    al
        and     al, cl
        ret
.LBB0_1:
        xor     eax, eax
        ret

vs...

        cmp     rdi, rsi
        je      .LBB0_33
        lea     rcx, [rdi + 1]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi], 48
        jne     .LBB0_32
        lea     rcx, [rdi + 2]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 1], 49
        jne     .LBB0_32
        lea     rcx, [rdi + 3]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 2], 50
        jne     .LBB0_32
        lea     rcx, [rdi + 4]
        xor     eax, eax
        cmp     rcx, rsi
        je      .LBB0_32
        cmp     byte ptr [rdi + 3], 51
        jne     .LBB0_32
        
        #... (this for a quite a while)
        
        cmp     byte ptr [rdi + 15], 70
        sete    cl
        add     rdi, 16
        cmp     rdi, rsi
        sete    al
        and     al, cl
        jmp     .LBB0_32
.LBB0_33:
        xor     eax, eax
.LBB0_32:
        ret

Andersama force-pushed the match_utf8_sequence branch 2 times, most recently from 09e933a to d6e093b Compare December 28, 2020 17:20

Andersama mentioned this pull request Dec 28, 2020

Concatenation of character<> into string<> missed if in subexpression #147

Closed

Andersama force-pushed the match_utf8_sequence branch 10 times, most recently from f9cdaa8 to 9c9d99f Compare December 29, 2020 08:50

optimizes string matching by allowing memcmp like functionality (even…

6e6f6e1

… on utf8 sequences) reference: hanickadot#147 comparison: https://compiler-explorer.com/z/Tz3KhG

Andersama force-pushed the match_utf8_sequence branch from 9c9d99f to 6e6f6e1 Compare December 29, 2020 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Accelerate string matching #159

[WIP] Accelerate string matching #159

Andersama commented Dec 28, 2020 •

edited

Andersama commented Dec 30, 2020

Andersama commented Dec 31, 2020 •

edited

[WIP] Accelerate string matching #159

Are you sure you want to change the base?

[WIP] Accelerate string matching #159

Conversation

Andersama commented Dec 28, 2020 • edited

Andersama commented Dec 30, 2020

Andersama commented Dec 31, 2020 • edited

Andersama commented Dec 28, 2020 •

edited

Andersama commented Dec 31, 2020 •

edited