Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode ISO 8859-1 into UTF-8 #187

Open
py-mine-bot opened this issue Feb 15, 2022 · 8 comments
Open

Decode ISO 8859-1 into UTF-8 #187

py-mine-bot opened this issue Feb 15, 2022 · 8 comments
Labels
area: API Related to core API of the project Github Import This was auto-imported from upstream repository type: feature New request or feature

Comments

@py-mine-bot
Copy link
Collaborator

kevinkjt2000 Authored by kevinkjt2000
Jan 6, 2022


Give examples for decoding ISO 8859-1 into UTF-8 (otherwise, people are surprised with garbage characters

Why are we not doing this automatically? Is there some benefit in keeping it in ISO 8859-1? I feel like this will only cause more issues to appear over time as people won't know what's going on since it's non-trivial to figure out. Even when documented, it would just clutter the docs when we could easily do this on the library side.

Originally posted by @ItsDrike in Dinnerbone/mcstatus#136 (comment)

@py-mine-bot py-mine-bot added type: feature New request or feature Github Import This was auto-imported from upstream repository labels Feb 17, 2022
@ItsDrike ItsDrike added the area: API Related to core API of the project label Feb 22, 2022
@PerchunPak
Copy link
Member

This will be a breaking change

>>> motd = "여보세요".encode("utf-8").decode("iso-8859-1")
\x97¬ë³´ì\x84¸ì\x9a\x94'
>>> motd = motd.encode("iso-8859-1").decode("utf-8")
'여보세요'
>>> motd.encode("iso-8859-1").decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)

Two times encode and decode will result in an error.

@PerchunPak
Copy link
Member

Nevermind, this breaking change already made #192.

@PerchunPak
Copy link
Member

PerchunPak commented Aug 17, 2022

Can't actually find new working workaround after changes which was introduced in #192. Never mind, the old workaround still works.

@PerchunPak
Copy link
Member

PerchunPak commented Jan 28, 2023

Reproduced! Write some non ISO-8859-1 symbols in server.properties and save it with UTF-8 encoding. Then start server with enabled query, and you have random symbols in mcstatus received answer.

So it's only possible with query and all fields there (like players or map).

P.S. I tried Paper and Vanilla cores. And Bedrock Vanilla (unaffected).

@PerchunPak
Copy link
Member

We can add fix_encoding method to all strings in QueryResponse, so no breaking change would be.

class StringWithFixEncoding(str):
    def fix_encoding(self) -> str:
        return self.encode("iso-8859-1").decode("utf-8")

@ItsDrike
Copy link
Member

How to minecraft clients handle this? Do they actually perform reencoding too? If this is just the case of some servers using UTF8 when minecraft motd doesn't support this, I don't think we should be supporting it either. A note about this in FAQ section of the wiki is more than sufficient in that case. If minecraft clients do actually perform this kind of reencoding, only then I'd consider changing the behavior of mcstatus here

@PerchunPak
Copy link
Member

How to minecraft clients handle this? Do they actually perform reencoding too? If this is just the case of some servers using UTF8 when minecraft motd doesn't support this, I don't think we should be supporting it either. A note about this in FAQ section of the wiki is more than sufficient in that case. If minecraft clients do actually perform this kind of reencoding, only then I'd consider changing the behavior of mcstatus here

Minecraft clients do not use query protocol at all (ref).

But if you meant Minecraft servers - then yes, they see that server.properties is in UTF-8 encoding, and just resave it with ISO one, so UTF symbols will be transformed to UTF escape sequences.

@PerchunPak
Copy link
Member

Can it be fixed by #504? As the PR mentions this question in FAQ here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Related to core API of the project Github Import This was auto-imported from upstream repository type: feature New request or feature
Projects
None yet
Development

No branches or pull requests

3 participants