Other languages #256

einhugur · 2024-01-12T18:56:18Z

There is something wrong I think with the language handling.

Even if setting the code page to Icelandic then it will always fail......

e.CodePage(CodePage.PC861_ICELANDIC)

Reason is here basically I think:

public virtual byte[] Print(string data)
{
// Fix OSX or Windows-style newlines
data = data.Replace("\r\n", "\n");
data = data.Replace("\r", "\n");

 // TODO: Sanitize...
 return data.ToCharArray().Select(x => (byte)x).ToArray();

}

Because the library just does ToCharArray from the input string then you get back Unicode bytes which can be much higher than what fits in the given code page. Thus not emitting correct char.

The text was updated successfully, but these errors were encountered:

einhugur · 2024-01-12T19:24:22Z

And here is the proof that it is indeed the ToCharArray there that messes it up.

So having set the code page to PC861_ICELANDIC then the only way to get Print and PrintLine to work is to pre-process all I send in with to make sure it is no longer actual unicode.

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

var chars = System.Text.Encoding.GetEncoding(861).GetBytes(token.Text);
string s = System.Text.Encoding.Latin1.GetString(chars);

result = Combine(result, e.PrintLine(s));

And then it works and I get Icelandic letters printed out.

So to fix the library then library should probably at very least add Print and PrintLine overides that just take byte[] since then I could send in the System.Text.Encoding.GetEncoding(861).GetBytes(token.Text) directly.

Or the Print and PrintLine would need to be fixed to take into account the selected code page for the printer, and get the bytes correctly from the string for that.

lukevp · 2024-01-14T22:55:26Z

The override for Print you are asking for is just Write (you can write byte arrays of whatever you want directly to the buffer).

You are correct in that conversion / compression of Unicode into printer code pages is not currently a function this library supports. I agree with you that this would be super valuable, but it requires some way to map unicode into the relevant code pages for each given language, and rejection of other unencodable unicode characters.

Another potential solution is to support these other languages in a bitmapped way, where we could render Unicode directly into an image and print it that way. It would require a clear definition of the pixel width of each printer, which could have a lookup table or could be specified manually. Then it would be a matter of properly printing and wrapping the generated image, but then we could support anything, even emojis! 😸 The whole concept of printer code pages would also no longer be relevant, and that would open the door to all sorts of cool layout options since at that point they're image-based and not directly related to the printer's supported layout and styling options. Heck you could even render html or PDFs to the printer at that point!

This library is MIT and we super appreciate any contributions. The work around various languages' printability (especially Kanji based languages) has come up a lot - they would seriously benefit from an image / rasterized print where the fonts are handled by the library, because the printers currently only support Katakana. If you're not familiar, languages like Japanese, Chinese, and Korean are symbolic and have a single symbol represent a whole word - meaning they have thousands of such characters. This doesn't fit into a limited memory of an embedded device and Unicode is not a standard of ESCPOS, so what they do is require an alternate character set called Katakana that is phonetically spelling out the words, which is much lengthier.

This isn't something I have the time to implement myself, but I think this would really help out our library, and I can definitely test this out for you on multiple printers if you want to tackle either of the 2 implementations above. The only thing I'd ask is to make it flexible enough to support more code pages than just Icelandic, even if you just stub out the other languages, so that as others need those languages, they can build and test out the mappings.

lukevp · 2024-01-14T23:00:26Z

In case it wasn't clear from my comment about directly writing the bytes - the Print function was originally intended to throw if unicode text is entered (which is what that //TODO: sanitize block is about). All you really have to do if you use the Write directly is to do the replacement of carriage returns with newlines.

I see after re-reading your post that there's already code page conversion built into .NET, it seems? In that case, could we have an extension of the print function that takes in the desired code page and throws if characters are outside the character set?

einhugur · 2024-01-15T09:25:10Z

Yes, I think extension of the print method would be excellent yes....

One that takes then the code page.

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

var chars = System.Text.Encoding.GetEncoding(861).GetBytes(token.Text);

So it would be something like

void PrintLine(string line, int codepage)

lukevp · 2024-01-17T12:47:53Z

You game for making a PR that adds this support?

Is there an enum that maps to the 861? Or how do you know which numbers are valid code pages?

einhugur · 2024-01-17T18:07:19Z

I will see about doing it on the weekend. There is no enum, I imagine it will just throw, if code page is invalid.

igorocampos · 2024-01-22T22:14:33Z

FYI @lukevp this is the second time (See #88 (comment)) someone has suggested such a change to Print() instead of using Write(), perhaps it's more natural for users of the library to have it both ways? I've just submitted a PR with a proposal for this. Please check #260

lukevp mentioned this issue Jan 14, 2024

is there Indian rupee sign ⟨₹⟩ in codepage? #252

Closed

This was referenced Jan 23, 2024

Adding Encoding property and examples in the README.md #260

Merged

Added overide to have printed string and printed line converted to given codepage #258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Other languages #256

Other languages #256

einhugur commented Jan 12, 2024

einhugur commented Jan 12, 2024

lukevp commented Jan 14, 2024 •

edited

lukevp commented Jan 14, 2024

einhugur commented Jan 15, 2024

lukevp commented Jan 17, 2024

einhugur commented Jan 17, 2024

igorocampos commented Jan 22, 2024 •

edited

Other languages #256

Other languages #256

Comments

einhugur commented Jan 12, 2024

einhugur commented Jan 12, 2024

lukevp commented Jan 14, 2024 • edited

lukevp commented Jan 14, 2024

einhugur commented Jan 15, 2024

lukevp commented Jan 17, 2024

einhugur commented Jan 17, 2024

igorocampos commented Jan 22, 2024 • edited

lukevp commented Jan 14, 2024 •

edited

igorocampos commented Jan 22, 2024 •

edited