-
-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subject with non-printable characters or new line is shown with ?
#4276
Comments
?
?
?
Sure :-) Would you like to just dive in, or would you like a few hints? NeoMutt's big and complicated. Feel free to ask lots of questions, here, or on IRC: |
Hints would be very welcome. I'll join IRC later, thanks!! |
The next bit is a quick tour from the Index to the string formatting. The Index displays a list of emails. The Index is a wrapper around a A parsed format string is stored as a tree of Expando Nodes. More detail in #3937 For the Index Subject Expando
Illegal characters, e.g. bad utf-8, are replaced with The function takes one flag, We might want to consider a flag that controls whether whitespace formatting is allowed. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I'm seeing some alignment issue with an interesting unicode character (
causes broken alignment:
With the following
Commenting this here after having a chat with @flatcap on IRC |
I've had a deeper dive into the code and we can do a better job (than my quick hack). There are a lot of format strings (Expandos), First lots of data (you can skim over this). |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I finished doing a lot of setup in my dev environment to code in C. Now I'm doing a C language tutorial while taking notes (this is a website I use to record stuff I learn). Sorry for being slow. Thank you so much for giving me so much information! I see you already a lot of things in mind. About the issue in question here, I was thinking in creating a Am I thinking in the correct direction? |
ooh! very organised
There's no rush
Yes. I hope it makes some sense. There are two parts: render and display. Render: turns an Expando + Data into a string. Display: Put the string on screen.
Not a bad idea, but I think we want to preserve as much as possible.
The current way NeoMutt does this is to use a standard "replacement character".
Cool |
I think this is personal taste. Using the e-mails I receive as a reference, I'd prefer these character stripped out instead of showing All emails in my inbox that have an I even have emails with
I have never received an email like that. Maybe if one sets their locale to something like What do you think about making the behavior of non-printable characters being configurable? The default is to preserve as much as possible. Changing the default would replace all non-printable characters with spaces, then we strip them out. Personally I would love it. |
That sounds reasonable, but what would you apply it to? and where? We probably only want to apply this to an email's subject. How common are these bad subjects? |
🤣 🤣 This comment made me laugh, actually. Probably because it's half true. What is spam is subjective, though. Google payment receipts are not spam for me, but it may be spam for others.
Google and Amazon send me the bad subjects that I would love to strip. Both are payment receipt emails.
So far we have reports of issues in subject and names (from/to), so we could start here. Ultimately I just want NeoMutt to behave as any other modern email client. I have never had issues like this back when I was using Thunderbird, for example. I also want to keep things simple, that's why I came to NeoMutt after all, but there is no need to neglect aesthetics of the inbox screen. Also, AFAIK if a system is using an |
We're talking about properly RFC 2047 encoded headers, right? Otherwise those emails are just plain broken and shouldn't be catered for, IMO. At the risk of sounding nasty / dismissive: neomutt should always stay an email client, not a gmail client.
The encoding in the locale is not relevant, IMO. Only the encoding specified by the RFC 2047 "word". And whether neomutt can render "prettily" or not depends just on the availability of the needed glyphs in the terminal font. -- |
Ok, I've finished my re-learning of C, I'm ready to start working on this. Thank you for your opinion @nobrowser , I'll quit about the The issueSo I want to organize and confirm what actually I should do here. We have two issues reported:
About (1), I downloaded the e-mail locally in a Maildir, and by checking the e-mail's raw file with a text editor we have this:
Not sure if it is useful. I don't know how to interpret this. Maybe @flatcap or @nobrowser can help me? The fixAfter we understand why the subject is becoming like that, I'd like to know what to do.
We also have a plan for the fix given by @flatcap
Just to understand, is this plan to allow newlines in places like the subject? By allowing newlines, would the newlines be replaced with whitespaces? |
Yes, that is the RFC 2047 encoding I mentioned. You see, email messages in transit are supposed to be encoded (for transport) entirely as 7 bit ASCII. So an encoding must be applied on top of whatever the native encoding is for the character set of the message. For the message body, the MIME standards specify how this transport-encoding is indicated to a recipient / client, but MIME doesn't apply to headers. For headers, there is this special mechanism defined in RFC 2047. Here, the "?B?" substrings indicate the transport-encoding is Base64. So, if you apply a base 64 decoder to each of the parts between =?= pairs, you'll get the original UTF-8 encoded subject. I take this as good news, meaning even Google & Amazon know how to RFC 2047 encode their headers. I believe neomutt already handles newlines in Subjects correctly provided they're encoded as above. When there's a raw newline in the Subject header of a message, I believe it should be treated like newlines in other headers, which are indeed just generic separators without meaning to the recipient, so replacing with space is the right thing to do. I think neomutt does this too, but not 100% sure ATM. -- |
Oooh now things are starting to make sense.
Every time What Edit: According to this, it is a LINE FEED (LF) (U+000A) |
NeoMutt is already doing the base64 decoding for you (other encodings exist). The plain subject is stored in The Expando code needs to first measure this string, counting all whitespace as Hmm... I didn't consider /me goes exploring in the code |
OK, change of plan. If the subject contains any bad whitespace or other bad characters, replace them and store the result in To check the string, it'll mean converting the chars from utf-8 into wide chars. |
The best template is Copy Have a look at the functions and see if that makes sense. |
I'm still looking into the code, but I'll write some notes here. bool subjrx_apply_mods(struct Envelope *env)
{
// This is when there is no envelope or no subject, we can ignore this for this issue
if (!env || !env->subject || (*env->subject == '\0'))
return false;
// This seems to be used when we merge envelopes. Not sure when it happens, but probably doesn't matter
if (env->disp_subj)
// Sanitize `env->disp_subj` here?
return true;
// Sanitize `env->subject` here
if (STAILQ_EMPTY(&SubjectRegexList))
// If the user have NOT configured the `subjectrx` option, keep the subject as is
return false;
// If the user have configured the `subjectrx` option, apply it here
env->disp_subj = mutt_replacelist_apply(&SubjectRegexList, env->subject);
return true;
} The return values --- a/expando/format.c
+++ b/expando/format.c
@@ -155,7 +155,7 @@ int format_string(struct Buffer *buf, int min_cols, int max_cols, enum FormatJus
else
#endif
if (!IsWPrint(wc))
- wc = '?';
+ wc = ReplacementChar;
w = wcwidth(wc);
} Unrelated to this issue, but isn't better to use Copying the whole @flatcap did you mean using Also wouldn't |
Yep :-)
Yes
Yes. If
Yeah, that's
Yes, we don't need those bits. |
Ok, the logic seems to be correct... But it doesn't work 😢 If I use this in my new function: if (iswspace(wc))
{
wc = '-';
} I get this subject:
Now, if I try this: if (iswspace(wc) || wc == '?')
{
wc = '-';
} I get the expected subject:
The That's it for today. I'll search more about it later... 😅 Edit: Tell me if it is better to push my changes to a branch already to help debugging stuff like this. |
Yes, you're welcome to create a |
Pushed to devel/subject-bad-characters |
I found the issue, it is in email/rfc2047.c L349. The original subject is being modified by
So as soon as I'll remove the |
Be careful -- this has security implications because you don't want to send untrusted data (which email headers clearly are) to the terminal. |
Continuation of the discussion in #4272
Expected Behaviour
Subject text without
?
, like when we check the emails in the browser.Actual Behaviour
Subject is shown as
??? ? ? ? ? ? ? ? ??Google Cloud Platform & APIs: 001112-001112-001112 の請求書の用意ができました
.Really? I'm confused by this answer in the PR.
Steps to Reproduce
Not sure how to send an email like that, maybe only programmatically. I'll look into it.
How often does this happen?
If the email isn't private, please attach it to this issue.
When did it start to happen?
NeoMutt Version
Extra Info
I also have something to ask @flatcap
I'd like to try fixing this issue myself. My main programming language is python, and only used C while in the university, about 6 years ago. Do you think I would be able to do it?
The text was updated successfully, but these errors were encountered: