[WIP] Fix [RegExp|String].prototype methods and Regex matcher #6615

MadProbe · 2021-02-20T18:43:59Z

This PR will fix spec deviations in:

RegExp.prototype[@@search]()
RegExp.prototype[@@split]()
String.prototype.split()
Some other methods yet to be added
Matcher unicode support
Clean out of the ES5 method versions

Also:

Unflag the ES6 RegExp methods and properties
Tests

rhuanjl · 2021-02-20T20:21:29Z

I'm not keen on killing all the fast paths - I suggested you'd need to check them but hopefully we can update conditions so they're only used when they won't cause a problem rather than just deleting them.

…into regex-fixes

lib/Runtime/Library/RegexHelper.cpp

rhuanjl · 2021-02-25T20:41:08Z

lib/Runtime/Library/RegexHelper.cpp

@@ -2412,7 +2429,7 @@ namespace Js
 {
 if (string->GetLength() > (0 > index || (uint64_t)index + (uint64_t)1 > UINT32_MAX ? UINT32_MAX : (uint32_t)(index + 1)) &&
 NumberUtilities::IsSurrogateLowerPart(string->GetString()[index]) &&
- NumberUtilities::IsSurrogateUpperPart(string->GetString()[index + 1]))
+ NumberUtilities::IsSurrogateUpperPart(string->GetString()[index + 1]) && isUnicode)


nit: ideally should do the isUnicode check first - it's much quicker, than the IsSurrogateLowerPart and IsSuurrogateUpperPart checks; if we do it first can skip the slower bits when it's false.

…to regex-fixes

rhuanjl · 2021-03-09T20:10:12Z

lib/Runtime/Library/JavascriptString.h

+ Field(codepoint_t*) m_codePointString; // Code points of the string, may be nullptr if not initialized yet by GetCodePoints call
+ Field(charcount_t) m_codePointsLength; // Length of the codepoints, may be k_InvalidCharCount if not initialized yet by GetCodePoints call


This will add two pointers to every single JS string - which isn't ideal - could we instead subclass JavascriptString with a new type like JavascriptStringWithCodePoints or something?

(Note I think this change is causing the compile failures as there is a static assert based on the size of a different subclass of JavascriptString)

So I will need to create a new instance of JavascriptStringWithCodePoints on each SimpleMatch call if pattern shows us that it's unicode one.
Honestly I cannot imagine how i will cache the codepoint strings.

You're right this is not an ideal solution - ugh - I didn't think this through did I. I think cache'ing is a no go.

Either a) we massively increase the size of all our strings OR b) we come up with a way to convert the type whenever we need to cache - which would not be simple as there could be many existing pointers to the string.

MadProbe added 3 commits February 20, 2021 21:05

kill regex props fast paths

df5334f

fix RegExp.prototype[@@search]()

ec40dc0

fix copyrights

919c6a3

MadProbe added 7 commits February 22, 2021 19:59

fix RegExp.prototype\[@@split]()

ec358a8

Fix String.prototype.split()

c8614d5

fix dumb mistake

446bd06

Fix GetRegExSymbolFunction

3132cdd

Merge branch 'master' into regex-fixes

898f10b

Fix AdvanceStringIndex

fbc00fc

Merge branch 'regex-fixes' of https://github.com/MadProbe/ChakraCore …

1c7ac95

…into regex-fixes

rhuanjl reviewed Feb 25, 2021

View reviewed changes

lib/Runtime/Library/RegexHelper.cpp Outdated Show resolved Hide resolved

Fix AdvanceLastIndex again

e4fd4be

rhuanjl reviewed Feb 25, 2021

View reviewed changes

MadProbe added 3 commits February 25, 2021 23:44

dont leak

e5b24b7

add JavascriptString::GetCodePoints for later use

68e9fa7

Merge branch 'master' of https://github.com/chakra-core/ChakraCore in…

d3b1f5a

…to regex-fixes

rhuanjl reviewed Mar 9, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix [RegExp|String].prototype methods and Regex matcher #6615

[WIP] Fix [RegExp|String].prototype methods and Regex matcher #6615

MadProbe commented Feb 20, 2021 •

edited

rhuanjl commented Feb 20, 2021

rhuanjl Feb 25, 2021 •

edited

rhuanjl Mar 9, 2021

MadProbe Mar 12, 2021

rhuanjl Mar 12, 2021

		Field(codepoint_t*) m_codePointString; // Code points of the string, may be nullptr if not initialized yet by GetCodePoints call
		Field(charcount_t) m_codePointsLength; // Length of the codepoints, may be k_InvalidCharCount if not initialized yet by GetCodePoints call

[WIP] Fix [RegExp|String].prototype methods and Regex matcher #6615

Are you sure you want to change the base?

[WIP] Fix [RegExp|String].prototype methods and Regex matcher #6615

Conversation

MadProbe commented Feb 20, 2021 • edited

rhuanjl commented Feb 20, 2021

rhuanjl Feb 25, 2021 • edited

Choose a reason for hiding this comment

rhuanjl Mar 9, 2021

Choose a reason for hiding this comment

MadProbe Mar 12, 2021

Choose a reason for hiding this comment

rhuanjl Mar 12, 2021

Choose a reason for hiding this comment

MadProbe commented Feb 20, 2021 •

edited

rhuanjl Feb 25, 2021 •

edited