Skip to content
This repository has been archived by the owner on Oct 10, 2019. It is now read-only.

Highlighting using listfocus is wrong for Tamil text #584

Open
j605 opened this issue Jul 25, 2017 · 7 comments
Open

Highlighting using listfocus is wrong for Tamil text #584

j605 opened this issue Jul 25, 2017 · 7 comments

Comments

@j605
Copy link

j605 commented Jul 25, 2017

Newsbeuter version (copy from newsbeuter -v):

newsbeuter 2.10-d20cf - http://www.newsbeuter.org/
Copyright (C) 2006-2015 Andreas Krennmair
Copyright (C) 2015-2017 Alexander Batischev
Copyright (C) 2006-2017 Newsbeuter contributors

newsbeuter is free software and licensed under the MIT/X Consortium License.
Type `newsbeuter -vv' for more information.

newsbeuter 2.10-d20cf
System: Linux 4.11.8-1-zen (x86_64)
Compiler: g++ 7.1.1 20170630
ncurses: ncurses 6.0.20170527 (compiled with 6.0)
libcurl: libcurl/7.54.1 OpenSSL/1.1.0f zlib/1.2.11 libpsl/0.17.0 (+libicu/59.1) libssh2/1.8.0 nghttp2/1.23.1 (compiled with 7.54.1)
SQLite: 3.19.3 (compiled with 3.19.3)
libxml2: compiled with 2.9.4

Steps to reproduce the issue, discussed with @Minoru

  1. Use a Tamil news feed, http://feeds.feedburner.com/dinamalar/Front_page_news

  2. Use a proper terminal emulator(Konsole works best, st works but mangles text) with a Tamil font, for example Noto Sans.

  3. Modify the colorscheme to https://raw.githubusercontent.com/j605/dotfiles/99e2f0f5a25e7f7c1772111efa06109a1e018f1e/newsbeuter/color in order have more pronounced change when the cursor is focused on an unread list item.

Expected outcome
The item is highlighted properly for all languages.

Actual Outcome
Part of the title is coloured and due to the colour scheme used, rest of the text is blocked out for Tamil text:
dyx

The behaviour also depends on the Terminal used, inst, since the text is not rendered properly (adjacent characters seem to have been merged keeping the length of the rendered text constant across different languages), the highlighting is correct but the text in itself is wrong. In konsole, text is rendered properly but highlighting is wrong.

@Minoru Minoru added the bug label Jul 26, 2017
@Minoru
Copy link
Collaborator

Minoru commented Jul 26, 2017

Confirmed with Konsole 17.04.1, Newsbeuter d20cf. STFL 0.22 and 0.24 behave the same.

I also ran Newsbeuter with debug logging enabled and it seems like Newsbeuter itself is not confused about the highlighting:

[2017-07-26 21:24:46] DEBUG: listformatter::add_line: `<unread>   1 N  Jul 26   945   'ரூ.1,500 கோடியை செலுத்துங்க!': சுப்ரதா ராய்க்கு கோர்ட் உத்தரவு </>'
[2017-07-26 21:24:46] DEBUG: listformatter::add_line: `<unread>   2 N  Jul 26   948   எல்லை தாண்டியதை இந்தியா ஒப்புக் கொண்டதா?</>'
[2017-07-26 21:24:46] DEBUG: listformatter::add_line: `<unread>   3 N  Jul 25   926   ஒய்வு பெறுகிறார் கேஹர்: அடுத்த சுப்ரீம் கோர்ட் தலைமை நீதிபதி தீபக் மிஸ்ரா பெயர் பரிந்துரை</>'
[2017-07-26 21:24:46] DEBUG: listformatter::add_line: `<unread>   4 N  Jul 25   886   இன்றைய(ஜூலை- 26) பெட்ரோல், டீசல் விலை?  </>'
[2017-07-26 21:24:46] DEBUG: listformatter::add_line: `<unread>   5 N  Jul 25   953   காஷ்மீர் பிரிவினைவாத தலைவர் ஷபீர் ஷா கைது</>'
[2017-07-26 21:24:46] DEBUG: listformatter::add_line: `<unread>   6 N  Jul 25   929   'நீட்' தேர்வில் இருந்து விலக்கு நிரந்தர தீர்வு தேவை: ஸ்டாலின்</>'

The opening and closing tags are in their proper place.

Current hypothesis: STFL is the one to blame.

TODO:

  • find out how STFL does highlighting
  • recreate their approach in a little Ncurses program

If Ncurses version has no problems displaying this, we'll know for sure it's STFL that's at fault. Otherwise we'll have to dig deeper, into Ncurses and beyond.

@Minoru
Copy link
Collaborator

Minoru commented Jul 27, 2017

While looking through STFL code, the following line from wt_list_prepare (widgets/wt_list.c) caught my attention:

int len = wcswidth(text,wcslen(text));

wcslen is supposed to calculate the string length, and wcswidth determines how many columns a terminal needs to represent that string.

Diving into this, I wrote a short program that I'm going to present verbatim here:

#define _XOPEN_SOURCE
#include <wchar.h>
#include <locale.h>

int main()
{
    const wchar_t * text = L"மாத சம்பளம் நன்கொடை: அதிபர் டிரம்ப் அசத்தல்";

    setlocale(LC_ALL, "en_US.utf8");

    wprintf(L"        |   .    |    .    |    .    |    .    |    .    |\n");
    wprintf(L"String: %ls\n", text);
    wprintf(L"Its length is %zi characters.\n", wcslen(text));
    wprintf(L"It needs %i columns to be printed out.\n", wcswidth(text, wcslen(text)));

    return 0;
}

The output in Konsole 17.04.1 looks like this:
2017-07-28-000611_1366x740_scrot

wcswidth thinks we need 35 columns for this string, but my improvised measure clearly shows that we need 39.

The text wasn't chosen on a whim; it's copied from Newsbeuter. Notice that the cutoff in Newsbeuter happens at 35th character:
2017-07-28-000958_1366x740_scrot

Manpage for wcswidth says it uses LC_CTYPE to figure things out. Guess we found the culprit, then.

TODO:

  • Find out who maintains locales
  • Report this upstream

@Minoru
Copy link
Collaborator

Minoru commented Jul 28, 2017

To clarify: I'm not saying the problem is definitely with this particular line; I'm saying the problem is, probably, with wcwidth overall.

@Minoru
Copy link
Collaborator

Minoru commented Jul 28, 2017

Tried the same program on FreeBSD 11.0-p9, the result is the same. Had to change "en_US.utf8" to "en_US.UTF-8", otherwise the code is the same.

@Minoru
Copy link
Collaborator

Minoru commented Jul 28, 2017

Interesting: if I copy the text from the console into the browser, the string takes up 37 columns; not 39 like it does in the terminal:

        |   .    |    .    |    .    |    .,,,,|    .    |
String: மாத சம்பளம் நன்கொடை: அதிபர் டிரம்ப் அசத்தல்
Its length is 43 characters.
It needs 35 columns to be printed out.

@j605, are you seeing the same thing in your browser? Who's right, the browser or the terminal? They both use monospaced font, so I don't think they both can be right simultaneously.

I found out that locales database is part of libc and maintained by GNU folks. I'm doing a bit more due diligence before filing a bug with them.

@j605
Copy link
Author

j605 commented Jul 28, 2017

As an aside we don't have a monospace Tamil font, so that is still a problem. The text is taking up 57 columns in my browser.

@Minoru
Copy link
Collaborator

Minoru commented Jul 29, 2017

Filed upstream.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants