New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_json(std::filesystem::path) can create invalid UTF-8 chars on windows #4271
Comments
I can also workaround this problem by adding a manifest XML that sets my app's code page to In CMake I wrapped this in a function:
which is used like this (probably want to wrap in a platform check):
with <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly> This solves the problem, if the app is running on at least Windows Version 1903. Still a bug but wanted to share this workaround because it's useful for many libraries that have the same issue. |
Proposed diff to do the conversion to UTF-8 when targeting windows: diff --git a/include/nlohmann/detail/conversions/to_json.hpp b/include/nlohmann/detail/conversions/to_json.hpp
index 562089c3..a8b74688 100644
--- a/include/nlohmann/detail/conversions/to_json.hpp
+++ b/include/nlohmann/detail/conversions/to_json.hpp
@@ -413,10 +413,20 @@ inline void to_json(BasicJsonType& j, const T& t)
}
#if JSON_HAS_FILESYSTEM || JSON_HAS_EXPERIMENTAL_FILESYSTEM
+#if defined(_WIN32)
+#include <windows.h>
+#endif
template<typename BasicJsonType>
inline void to_json(BasicJsonType& j, const std_fs::path& p)
{
+#if defined(_WIN32)
+ int len = ::WideCharToMultiByte(CP_UTF8, 0, &p.native()[0], p.native().size(), nullptr, 0, nullptr, nullptr);
+ std::string as_utf8(len, 0);
+ ::WideCharToMultiByte(CP_UTF8, 0, &p.native()[0], p.native().size(), &narrowed_string[0], len, nullptr, nullptr);
+ j = std::move(as_utf8);
+#else
j = p.string();
+#endif
}
#endif |
Description
This conversion function:
https://github.com/nlohmann/json/blob/7efe875495a3ed7d805ddbb01af0c7725f50c88b/include/nlohmann/detail/conversions/to_json.hpp#L416C1-L420C2
uses
p.string()
, which does not give a UTF-8-encoded string on windows (in some cases, maybe?). Trying todump()
the resultant JSON throws a "invalid UTF-8 byte" exception.Reproduction steps
Convert a
std::filesystem::path
, which contains a unicode "Right Single Quotation Mark" character (U+2019), to ajson
implicitly or withto_json
.Inspect the new
json (string_t)
's bytes, either bydump()
ing, or converting to BSON.Expected vs. actual results
Expected: "Strings are stored in UTF-8 encoding." per https://json.nlohmann.me/api/basic_json/string_t/
Actual: The string gets converted by
std::filesystem::path::string()
, which appears to convert it to Windows-1252 encoding. Its bytes end up as\x92
rather than\xe2\x80\x99
.Minimal code example
Workaround I'm using is to use
WideCharToMultiByte
+.native()
to get the string in UTF-8 before passing to nlohmann:Error messages
"[json.exception.type_error.316] invalid UTF-8 byte at index 0: 0x92
Compiler and operating system
MSVC 2022 Professional, C++ 20
Library version
develop - a259ecc
Validation
develop
branch is used.The text was updated successfully, but these errors were encountered: