Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'gbk' codec can't decode byte 0xa3 in position 191: illegal multibyte sequence #953

Open
starrysky9959 opened this issue Jan 18, 2021 · 5 comments · May be fixed by #961
Open

'gbk' codec can't decode byte 0xa3 in position 191: illegal multibyte sequence #953

starrysky9959 opened this issue Jan 18, 2021 · 5 comments · May be fixed by #961

Comments

@starrysky9959
Copy link

When I use mycli on Windows10 to load a sql script, for example, source XXX.sql, an error occurred: 'gbk' codec can't decode byte 0xa3 in position 191: illegal multibyte sequence.
I thought this is a question about opening a file. According to the log, I modified the code in line 255 in main.py. Then it works.
image

@pasenor
Copy link
Member

pasenor commented Jan 19, 2021

Hi, thanks for pointing that out. I have not been able to reproduce it on Windows 10, but it's probably down to the locale or some other settings, I don't think it's worth digging really. The better question is whether we should simply add the utf-8 default, or try to detect the file encoding. For example, if the mysql server or database encoding is latin1, and the script is in unicode, should we silently run it or at least warn beforehand?

@starrysky9959
Copy link
Author

Thank you for your reply. It just solved my problem on my PC and I haven't considered comprehensively enough. As you said, trying to detect the file encoding is a better solution.

@rolandwalker
Copy link
Contributor

@pasenor while it is possible to automatically detect the encoding of file contents, that is only true with some limitations. I'd also say it is out-of-scope for mycli, and that too much magic makes a tool hard to predict.

We default to a utf8 connection type, and hopefully soon utf8mb4, so I vote for a UTF-8 default for reading the file.

Whether we should change that default based on the database encoding is a different and interesting question. If the file is in UTF-8, and the database connection type is set to latin1, mycli+mysql should already do the right thing. We could test some scenarios.

@pasenor
Copy link
Member

pasenor commented Jan 19, 2021

Yes, I tend to agree that we should not attempt to magically detect the encoding. But I don't know what the "right thing" here should be, even if the connection type is utf-8, but the database encoding isn't. Perhaps we could try to read the file with the encoding specified in the connection and warn the user if it fails. If the user insists, then proceed with the utf-8 default.

Worth testing in any case.

@aleimu
Copy link

aleimu commented Feb 2, 2021

mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants