-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad UTF-8 filename encoding #30
Comments
Thanks for the detailed report, @rodarima! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have some files with a bad encoding. I think they came from a FAT32 pendrive using the latin-1 encoding, and are now in a EXT4 filesystem. Once I try to see the directory in rover, those files appear in a empty line; not even the size is shown.
I can replicate the behaviour by creating a bogus file (I'm using spanish locale, but any UTF-8 should work):
The problem is that the \355 character (í in latin encoding) is 0xED and hence a 2 multibyte starting byte in UTF-8. As the next character does't continue the 2 multibyte encoding, is an incorrect UTF-8 string.
The functions mbstowcs() and swprintf() are failling silently, returning -1, as they cannot deal with the string. So nothing gets copied to the WBUF buffer, and the row remains empty.
If you create a bogus directory, the behavior is even more interesting. The WBUF gets reused from the last usage, and the filename seems to be named as the CWD or the previous directory.
I was thinking in how to solve the issue, perhaps some workaround like the ls(1) program does, replacing the spurious character with an ? symbol or similar. Deletion and other operations work fine.
The text was updated successfully, but these errors were encountered: