-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Globalize number formatter is incorrect for numeric digits in supplemental plane #922
Comments
Thanks for filing the issue and your detailed debugging. I am open to accept a fix. Thanks! |
@rxaviers I'll see what I can do. Any guidance on roughly where in the code I should be looking? |
Awesome. Numbering system digits are set at https://github.com/globalizejs/globalize/blob/master/src/number/numbering-system-digits-map.js, stored as formatter properties at https://github.com/globalizejs/globalize/blob/master/src/number/format-properties.js#L63, then used here https://github.com/globalizejs/globalize/blob/master/src/number/format.js#L96. Their respective unit tests can be found https://github.com/globalizejs/globalize/blob/master/test/unit/number/format-properties.js and https://github.com/globalizejs/globalize/blob/master/test/unit/number/format.js. |
OK, this issue isn't going to be my highest priority, though I will hopefully get round to it at some point. I believe the issue only affects 4 locales, all related to the base ccp locale: ccp, ccp-u-nu-native, ccp-IN and ccp-IN-u-nu-native. |
Hi there
globalise (v1.7.0) number formatting is incorrect for cldr-data (v36.0.0), when cldr numeric digits are from the UTF-16 supplemental plane (from U+010000 to U+10FFFF).
Short example, discussed below: 44.56 formatted in ccp locale
Based on the formatted value returned by globalise, I initially suspected that individual characters are somehow being represented in globalize as surrogate pairs (so two 16-bit hex values), but only the first of these hex values is returned. There's a worked example below, except I now have some doubts over this theory: for the 4 numeric digits involved, 3 of the digits returned by globalize seem to be the first half of a surrogate pair, but one isn't.
Example (no code)
For the "ccp" locale, digitals 0-9 are "𑄶𑄷𑄸𑄹𑄺𑄻𑄼𑄽𑄾𑄿", which have unicode hex codepoints of ["11136", "11137", "11138", "11139", "1113a", "1113b", "1113c", "1113d", "1113e", "1113f"].
So the number 44.56 formatted in ccp should be "𑄺𑄺.𑄻𑄼" = ["1113a", "1113a", "2e", "1113b", "1113c"]
What is actually returned from globalise is "��.��" = [ 'd804', 'd804', '2e', 'dd38', 'd804' ]
Using the Surrogate Pair Calculator for the individual characters in "𑄺𑄺.𑄻𑄼" = ["1113a", "1113a", "2e", "1113b", "1113c"]
So maybe globalise is returning the first hex value from each surrogate pair? But dd38 is returned, not D804 (for 1113b)
Example (code)
The text was updated successfully, but these errors were encountered: