Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crystal doesn't UTF-8-Validate first byte of input #14579

Open
BlobCodes opened this issue May 8, 2024 · 4 comments
Open

Crystal doesn't UTF-8-Validate first byte of input #14579

BlobCodes opened this issue May 8, 2024 · 4 comments

Comments

@BlobCodes
Copy link
Contributor

BlobCodes commented May 8, 2024

Bug Report

The following code compiles fine, even though the macro generates invalid UTF-8:

{{ "\xFF = 2".id }}

This only works if the first character of any input is invalid UTF-8. If any other character is invalid, an exception is raised:

{{ "\xFF\xFE = 2".id }}
# Unexpected byte 0xfe at position 1, malformed UTF-8 (InvalidByteSequenceError)
#   from /crystal/src/compiler/crystal/syntax/lexer.cr:2759:9 in '??'
#   from /crystal/src/compiler/crystal/syntax/lexer.cr:1057:11 in 'next_token'
#   from /crystal/src/enum.cr:361:3 in 'parse_macro_source'
#   from /crystal/src/compiler/crystal/semantic/semantic_visitor.cr:359:23 in 'expand_inline_macro'
#   from /crystal/src/compiler/crystal/semantic/semantic_visitor.cr:431:3 in 'accept'
#   from /crystal/src/enumerable.cr:510:7 in '??'
#   from /crystal/src/compiler/crystal/syntax/visitor.cr:27:12 in 'accept'
#   from /crystal/src/compiler/crystal/semantic.cr:70:7 in 'semantic:cleanup'
#   from /crystal/src/compiler/crystal/compiler.cr:201:14 in 'compile:combine_rpath'
#   from /crystal/src/compiler/crystal/compiler.cr:195:56 in 'compile:combine_rpath'
#   from /crystal/src/compiler/crystal/command/eval.cr:30:5 in 'eval'
#   from /crystal/src/compiler/crystal/command.cr:126:12 in 'run'
#   from /crystal/src/compiler/crystal.cr:11:1 in '__crystal_main'
#   from /crystal/src/crystal/main.cr:129:5 in 'main'
#   from src/env/__libc_start_main.c:95:2 in 'libc_start_main_stage2'
# Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues

Oh, and this "you've found a bug in the Crystal compiler" message should probably also be fixed.


$ crystal -v

Crystal 1.12.1 [4cea10199] (2024-04-11)

LLVM: 15.0.7
Default target: x86_64-unknown-linux-gnu
@straight-shoota
Copy link
Member

Oh, and this "you've found a bug in the Crystal compiler" message should probably also be fixed.

What fixing does it need?

@BlobCodes
Copy link
Contributor Author

I just meant that macros generating invalid UTF-8 shouldn't result in a "compiler bug" message because it's user error.

@straight-shoota
Copy link
Member

Note the same error appears when the first byte of the source file is invalid UTF-8 encoding.

$ echo '\xFF' | bin/crystal eval
Using compiled compiler at .build/crystal
Regex match error: UTF-8 error: illegal byte (0xfe or 0xff) (ArgumentError)
  from src/regex/pcre2.cr:275:9 in 'match_data'
  from src/regex/pcre2.cr:207:18 in 'match_impl'
  from src/regex.cr:672:12 in 'match_at_byte_index'
  from src/regex.cr:621:12 in 'match:options'
  from src/string.cr:3227:13 in '=~'
  from src/compiler/crystal/semantic/suggestions.cr:41:25 in 'lookup_similar_def'
  from src/compiler/crystal/semantic/suggestions.cr:73:7 in 'lookup_similar_def_name'
  from src/compiler/crystal/semantic/call_error.cr:594:5 in 'raise_undefined_method'
  from src/compiler/crystal/semantic/call_error.cr:98:7 in 'raise_matches_not_found'
  from src/compiler/crystal/semantic/call.cr:291:9 in 'lookup_matches_in_type'
  from src/compiler/crystal/semantic/call.cr:254:3 in 'lookup_matches_in_type:search_in_parents:with_autocast'
  from src/compiler/crystal/semantic/call.cr:210:5 in 'lookup_matches_in'
  from src/compiler/crystal/semantic/call.cr:209:3 in 'lookup_matches_in:with_autocast'
  from src/compiler/crystal/semantic/call.cr:197:7 in 'lookup_matches_without_splat'
  from src/compiler/crystal/semantic/call.cr:124:17 in 'lookup_matches:with_autocast'
  from src/compiler/crystal/semantic/call.cr:113:5 in 'lookup_matches'
  from src/compiler/crystal/semantic/call.cr:90:15 in 'recalculate'
  from src/compiler/crystal/semantic/main_visitor.cr:1380:7 in 'recalculate_call'
  from src/compiler/crystal/semantic/main_visitor.cr:1359:7 in 'visit'
  from src/compiler/crystal/syntax/visitor.cr:27:12 in 'accept'
  from src/compiler/crystal/semantic/main_visitor.cr:688:11 in 'visit'
  from src/compiler/crystal/syntax/visitor.cr:27:12 in 'accept'
  from src/compiler/crystal/semantic/main_visitor.cr:6:7 in 'visit_main:process_finished_hooks:cleanup:visitor'
  from src/compiler/crystal/progress_tracker.cr:22:7 in 'semantic:cleanup'
  from src/compiler/crystal/compiler.cr:219:14 in 'compile:combine_rpath'
  from src/compiler/crystal/compiler.cr:213:56 in 'compile:combine_rpath'
  from src/compiler/crystal/command/eval.cr:29:5 in 'eval'
  from src/compiler/crystal/command.cr:101:7 in 'run'
  from src/compiler/crystal/command.cr:55:5 in 'run'
  from src/compiler/crystal/command.cr:54:3 in 'run'
  from src/compiler/crystal.cr:11:1 in '__crystal_main'
  from src/crystal/main.cr:129:5 in 'main_user_code'
  from src/crystal/main.cr:115:7 in 'main'
  from src/crystal/main.cr:141:3 in 'main'
  from /lib/x86_64-linux-gnu/libc.so.6 in '??'
  from /lib/x86_64-linux-gnu/libc.so.6 in '__libc_start_main'
  from /home/johannes/src/crystal-lang/crystal/.build/crystal in '_start'
  from ???
Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues

It's very peculiar that the compiler advances as far as into a regex match to look for similar names until it notices something is wrong.

If an invalid encoding is in any later byte, the compiler errors gracefully:

$ echo '-\xFF' | crystal eval
Error: file 'eval' is not a valid Crystal source file: Unexpected byte 0xff at position 1, malformed UTF-8

@HertzDevil
Copy link
Contributor

The expected error is only raised inside Crystal::Lexer#next_char_no_column_increment after a call to Char::Reader#next_char; this needs to be done in Crystal::Lexer#initialize as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants