Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LineReader stops reading when it hits a character like "É" or "ñ" #5

Open
pkamb opened this issue Sep 14, 2011 · 11 comments
Open

LineReader stops reading when it hits a character like "É" or "ñ" #5

pkamb opened this issue Sep 14, 2011 · 11 comments

Comments

@pkamb
Copy link

pkamb commented Sep 14, 2011

So you have a textfile such as:

diner
restaurant
lunch-spot
greasy spoon
café // "é" character
coffee shop
cafeteria

LineReader stops reading when it hits the "café" line above. Never gets to "coffee shop".

@johnjohndoe
Copy link
Owner

Maybe the file is not encoded using UTF-8? I use NSUTF8StringEncoding in the FileReader. See (NSString*)readLine in line 72. Maybe you can find a way to discover the encoding type of the file before you start reading its content. You are welcome to fork the project.

@ZuzooVn
Copy link

ZuzooVn commented Feb 13, 2013

Hi, i still have this problems

@johnjohndoe
Copy link
Owner

Have you verified which character encoding is used by the file you are trying to read?

@ZuzooVn
Copy link

ZuzooVn commented Feb 13, 2013

Hi, it's Unicode (UTF-8)

@johnjohndoe
Copy link
Owner

Could you can upload a zipped sample somewhere? Then I will find the time to take a look at it in a few days.

@ZuzooVn
Copy link

ZuzooVn commented Feb 13, 2013

I think you can create new document with some character like í, é, ñ ..... Or i will update some sample data

@johnjohndoe
Copy link
Owner

I think you should really upload an example file somewhere. I can write an ñ both into an ASCII or UTF-8 encoded file.
You can also find out yourself about the character encoding used in the file with an editor. If you are using Windows I recommend Notepad++. On MacOSX or Linux run the following command in a shell: $ file filename.

@ZuzooVn
Copy link

ZuzooVn commented Feb 14, 2013

This is file's info: Non-ISO extended-ASCII English text, with very long lines, with CRLF line terminators.

This is the file: http://www.mediafire.com/?1cwr4if28w504md

It have "î" character

@johnjohndoe
Copy link
Owner

Agreed. As I suspected the file is not encoded as UTF-8.

notepadplusplus

I converted the file to UTF-8 using Notepad++ (options are visible in the menu) so you can try again with this file.

@ZuzooVn
Copy link

ZuzooVn commented Feb 16, 2013

Maybe we must automatically convert all file to UTF-8 before start reading its content

@johnjohndoe
Copy link
Owner

I suggest that you look for a way to recognize the character encoding in front. Feel free to add it to the LineReader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants