[rqd] Fix non ASCII chars #1335

ramonfigueiredo · 2024-01-17T21:12:58Z

Convert to ASCII while discarding characters that can not be encoded

ramonfigueiredo · 2024-01-17T21:19:12Z

Problem

Before, if the RQD logs with non-ASCII text, the RQD crashes and the CueGUI shows the process as running for a long time.

Error

UnicodeEncodeError: 'ascii' codec can't encode character u'\u221e' in position 64: ordinal not in range(128)

The problem is on the pipe_to_file function when trying to output non-ASCII characters to the file.

Error happens in the code below:

def pipe_to_file(stdout, stderr, outfile):
    ...

    def print_and_flush_ln(fd, last_timestamp):
    ...
        for line in lines[0:-1]:
            # Convert to ASCII while discarding characters that can not be encoded
            line = line.encode('ascii', 'ignore')
            print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

Solution

BEFORE

for line in lines[0:-1]:
    print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

NEW SOLUTION

for line in lines[0:-1]:
    # Convert to ASCII while discarding characters that can not be encoded
    line = line.encode('ascii', 'ignore')
    print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

About the bug fix

Now, the RQD will ignore characters non-ascii in the logs and the RQD will work correctly.

For example:

If the log is:
'text here 영화, café'

It will be:
'text here , caf'

It will ignore the Korean character '영화' and the 'é'.

Tests

python

>>> text = 'text here 영화, café'

>>> text.encode('ascii')
Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)

>>> text.encode('ascii', 'ignore')
b'text here , caf'

ramonfigueiredo · 2024-01-17T21:37:42Z

Other solutions

Replace non-ASCII character with '?'

python

>>> text = 'text here 영화, café'

Current solution

>>> text.encode('ascii', 'ignore')
b'text here , caf'

Solution using .encode('ascii', 'replace')

>>> text.encode('ascii', 'replace')
b'text here ??, caf?'

Encode with UTF-8, instead of ASCII

It is also possible to keep UTF-8, but use encode() and decode():

UTF-8 example

>>> text = 'text here 영화, café'
>>> text_encoded = text.encode('utf-8', 'ignore')
>>> text_encoded.decode()
'text here �\x98\x81�\x99\x94, café'

ASCII example

>>> text = 'text here 영화, café'e')
>>> text_encoded = text.encode('ascii', 'ignore')
>>> text_encoded.decode()
'text here , caf'

- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt

- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt - Change docs/conf.py to use language = 'en'

docs/requirements.txt

bcipriano · 2024-01-19T18:07:39Z

rqd/rqd/rqcore.py

@@ -1219,6 +1219,8 @@ def print_and_flush_ln(fd, last_timestamp):

 remainder = lines[-1]
 for line in lines[0:-1]:
+ # Convert to ASCII while discarding characters that can not be encoded
+ line = line.encode('ascii', 'ignore')


@ramonfigueiredo Thank you for the PR and the detailed write up. The detail makes it nice and easy to review with the proper context.

So, this error occurs when we intercept output in order to prepend a timestamp. What is logged to the file when RQD_PREPEND_TIMESTAMP is False?

My sense is that the output which is logged when RQD_PREPEND_TIMESTAMP is True vs False should be as similar as possible, aside from the timestamp of course.

Removing changes to update Sphinx version

[rqd] Fix non ASCII chars

fe618e3

Convert to ASCII while discarding characters that can not be encoded

ramonfigueiredo requested review from bcipriano, gregdenton, jrray, smith1511, larsbijl, DiegoTavares, IdrisMiles and splhack as code owners January 17, 2024 21:12

ramonfigueiredo added 2 commits January 17, 2024 15:47

[rqd] Fix non ASCII chars

8fdfc93

- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt

[rqd] Fix non ASCII chars

20d5864

- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt - Change docs/conf.py to use language = 'en'

bcipriano requested changes Jan 19, 2024

View reviewed changes

[rqd] Fix non ASCII chars

31e1402

Removing changes to update Sphinx version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rqd] Fix non ASCII chars #1335

[rqd] Fix non ASCII chars #1335

ramonfigueiredo commented Jan 17, 2024

ramonfigueiredo commented Jan 17, 2024 •

edited

ramonfigueiredo commented Jan 17, 2024

bcipriano Jan 19, 2024

[rqd] Fix non ASCII chars #1335

Are you sure you want to change the base?

[rqd] Fix non ASCII chars #1335

Conversation

ramonfigueiredo commented Jan 17, 2024

ramonfigueiredo commented Jan 17, 2024 • edited

Problem

Error

Solution

About the bug fix

Tests

ramonfigueiredo commented Jan 17, 2024

Other solutions

Replace non-ASCII character with '?'

Encode with UTF-8, instead of ASCII

bcipriano Jan 19, 2024

Choose a reason for hiding this comment

ramonfigueiredo commented Jan 17, 2024 •

edited