Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other languages #256

Open
einhugur opened this issue Jan 12, 2024 · 7 comments
Open

Other languages #256

einhugur opened this issue Jan 12, 2024 · 7 comments

Comments

@einhugur
Copy link

There is something wrong I think with the language handling.

Even if setting the code page to Icelandic then it will always fail......

e.CodePage(CodePage.PC861_ICELANDIC)

Reason is here basically I think:

public virtual byte[] Print(string data)
{
// Fix OSX or Windows-style newlines
data = data.Replace("\r\n", "\n");
data = data.Replace("\r", "\n");

 // TODO: Sanitize...
 return data.ToCharArray().Select(x => (byte)x).ToArray();

}

Because the library just does ToCharArray from the input string then you get back Unicode bytes which can be much higher than what fits in the given code page. Thus not emitting correct char.

@einhugur
Copy link
Author

And here is the proof that it is indeed the ToCharArray there that messes it up.

So having set the code page to PC861_ICELANDIC then the only way to get Print and PrintLine to work is to pre-process all I send in with to make sure it is no longer actual unicode.

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

var chars = System.Text.Encoding.GetEncoding(861).GetBytes(token.Text);
string s = System.Text.Encoding.Latin1.GetString(chars);

result = Combine(result, e.PrintLine(s));

And then it works and I get Icelandic letters printed out.

So to fix the library then library should probably at very least add Print and PrintLine overides that just take byte[] since then I could send in the System.Text.Encoding.GetEncoding(861).GetBytes(token.Text) directly.

Or the Print and PrintLine would need to be fixed to take into account the selected code page for the printer, and get the bytes correctly from the string for that.

@lukevp
Copy link
Owner

lukevp commented Jan 14, 2024

The override for Print you are asking for is just Write (you can write byte arrays of whatever you want directly to the buffer).

You are correct in that conversion / compression of Unicode into printer code pages is not currently a function this library supports. I agree with you that this would be super valuable, but it requires some way to map unicode into the relevant code pages for each given language, and rejection of other unencodable unicode characters.

Another potential solution is to support these other languages in a bitmapped way, where we could render Unicode directly into an image and print it that way. It would require a clear definition of the pixel width of each printer, which could have a lookup table or could be specified manually. Then it would be a matter of properly printing and wrapping the generated image, but then we could support anything, even emojis! 😸 The whole concept of printer code pages would also no longer be relevant, and that would open the door to all sorts of cool layout options since at that point they're image-based and not directly related to the printer's supported layout and styling options. Heck you could even render html or PDFs to the printer at that point!

This library is MIT and we super appreciate any contributions. The work around various languages' printability (especially Kanji based languages) has come up a lot - they would seriously benefit from an image / rasterized print where the fonts are handled by the library, because the printers currently only support Katakana. If you're not familiar, languages like Japanese, Chinese, and Korean are symbolic and have a single symbol represent a whole word - meaning they have thousands of such characters. This doesn't fit into a limited memory of an embedded device and Unicode is not a standard of ESCPOS, so what they do is require an alternate character set called Katakana that is phonetically spelling out the words, which is much lengthier.

This isn't something I have the time to implement myself, but I think this would really help out our library, and I can definitely test this out for you on multiple printers if you want to tackle either of the 2 implementations above. The only thing I'd ask is to make it flexible enough to support more code pages than just Icelandic, even if you just stub out the other languages, so that as others need those languages, they can build and test out the mappings.

@lukevp
Copy link
Owner

lukevp commented Jan 14, 2024

In case it wasn't clear from my comment about directly writing the bytes - the Print function was originally intended to throw if unicode text is entered (which is what that //TODO: sanitize block is about). All you really have to do if you use the Write directly is to do the replacement of carriage returns with newlines.

I see after re-reading your post that there's already code page conversion built into .NET, it seems? In that case, could we have an extension of the print function that takes in the desired code page and throws if characters are outside the character set?

@einhugur
Copy link
Author

Yes, I think extension of the print method would be excellent yes....

One that takes then the code page.

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

var chars = System.Text.Encoding.GetEncoding(861).GetBytes(token.Text);

So it would be something like

void PrintLine(string line, int codepage)

@lukevp
Copy link
Owner

lukevp commented Jan 17, 2024

You game for making a PR that adds this support?

Is there an enum that maps to the 861? Or how do you know which numbers are valid code pages?

@einhugur
Copy link
Author

I will see about doing it on the weekend. There is no enum, I imagine it will just throw, if code page is invalid.

@igorocampos
Copy link
Collaborator

igorocampos commented Jan 22, 2024

FYI @lukevp this is the second time (See #88 (comment)) someone has suggested such a change to Print() instead of using Write(), perhaps it's more natural for users of the library to have it both ways? I've just submitted a PR with a proposal for this. Please check #260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants