-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
indices
reports byte offsets instead of character offsets
#3064
Labels
Comments
wader
added a commit
to wader/jq
that referenced
this issue
Mar 12, 2024
Previsouly byte index was used. Fixes jqlang#1430 jqlang#1624 jqlang#3064
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
jq uses characters to index strings.
To see that, we can run
"馃嚞馃嚙oo" | .[0 : 1,2,3,4]
, which yields "馃嚞" "馃嚞馃嚙" "馃嚞馃嚙o" "馃嚞馃嚙oo".Note that 馃嚞馃嚙 is actually two characters and 8 bytes, as we can see from
"馃嚞馃嚙" | length, utf8bytelength
.However, the
indices
filter returns byte offsets to the pattern in the string.The documentation does not specify the behaviour of
indices
for UTF-8 strings, but given thatlength
and.[x:y]
use character counts to index strings, it is likely that this is a bug and not just undocumented behaviour.To Reproduce
$ ./jq-linux-amd64-1.7.1 -nc '"馃嚞馃嚙oo" | indices("o")'
[8,9]
$ ./jq-linux-amd64-1.7.1 -nc '"茠oo" | indices("o")'
[2,3]
Expected behavior
$ ./jq-linux-amd64-1.7.1-fixed -nc '"馃嚞馃嚙oo" | indices("o")'
[2,3]
$ ./jq-linux-amd64-1.7.1-fixed -nc '"茠oo" | indices("o")'
[1,2]
The problem is probably caused in jv_string_indexes.
The text was updated successfully, but these errors were encountered: