Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GB1030 dotnet info #102174

Open
agocke opened this issue May 13, 2024 · 3 comments · May be fixed by #102295
Open

GB1030 dotnet info #102174

agocke opened this issue May 13, 2024 · 3 comments · May be fixed by #102295
Assignees
Milestone

Comments

@agocke
Copy link
Member

agocke commented May 13, 2024

  1. copy a group of GB18030 characters(level 3 long sample data)
    𫚭﹫𪛒𫍲𫟼𬘭𮯠⾢)

  2. Use the supported string to create folder. 龯蝌灋齅ㄥ䶱

  3. unzip dotnet-sdk-9.0.100-preview.4.24218.26-win-x64.zip to above folder

  4. access to the folder and run dotnet --info

Expected Result:

the contents can be displayed well.

Actual Result:

GB18030 characters displays incorrectly.
image

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 13, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 13, 2024
Copy link
Contributor

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

@agocke agocke added this to the 9.0.0 milestone May 13, 2024
@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 13, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label May 13, 2024
@elinor-fung
Copy link
Member

elinor-fung commented May 14, 2024

I had looked into this last week, thinking it would be quick (spoiler: it wasn't). Here's where I got:

The host prints using the C functions (fputws, vfwprintf), which do the conversion from wide to multi-byte as if using wcrtomb - that means it is using C locale (specifically LC_CTYPE). This does not map to what the user may have set through something like chcp 65001 so what may be programmatically set through SetConsoleOutputCP.

This affects all tracing in the host on Windows - so basic output (like --info called out by this issue), any error messages, and any detailed logging to stderr or to a file. These are the functions of interest:

inline void file_vprintf(FILE* f, const char_t* format, va_list vl) { ::vfwprintf(f, format, vl); ::fputwc(_X('\n'), f); }
inline void err_fputs(const char_t* message) { ::fputws(message, stderr); ::fputwc(_X('\n'), stderr); }
inline void out_vprintf(const char_t* format, va_list vl) { ::vfwprintf(stdout, format, vl); ::fputwc(_X('\n'), stdout); }

Options I looked at:

  • explicitly setting locale (setlocale) to ".utf8" and reverting it after
    • setlocale only started supporting using a UTF-8 code page with Win 10 version 1803.
  • using _l variants of the functions that take a locale
    • would need to create/free the locale that is passed in
  • directly using WriteConsoleW instead of the C functions, which should respect the user's code page settings
    • we'd need to explicitly format before instead of using vwfprintf to handle the arg formatting like we do now

Just trying things out in a console app, I found that for the functions based on C locale, I had to set/pass the locale with UTF-8 as the code page and set the console code page (chcp or SetConsoleOutputCP). WriteConsoleW would respect what code page is set for the console. On my Windows 11 machine, I didn't actually have to explicitly change my code page (which was not set to UTF-8) to have WriteConsoleW display the expected output - maybe some of the improved UTF-8 support that has been coming to Windows over the years.

@jtschuster
Copy link
Member

@elinor-fung Thank you so much for the info, I would have been pretty lost trying to fix this issue without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
4 participants