Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rqd] Fix non ASCII chars #1335

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ramonfigueiredo
Copy link
Collaborator

Convert to ASCII while discarding characters that can not be encoded

Convert to ASCII while discarding characters that can not be encoded
@ramonfigueiredo
Copy link
Collaborator Author

ramonfigueiredo commented Jan 17, 2024

Problem

Before, if the RQD logs with non-ASCII text, the RQD crashes and the CueGUI shows the process as running for a long time.

Error

UnicodeEncodeError: 'ascii' codec can't encode character u'\u221e' in position 64: ordinal not in range(128)

The problem is on the pipe_to_file function when trying to output non-ASCII characters to the file.

Error happens in the code below:

def pipe_to_file(stdout, stderr, outfile):
    ...

    def print_and_flush_ln(fd, last_timestamp):
    ...
        for line in lines[0:-1]:
            # Convert to ASCII while discarding characters that can not be encoded
            line = line.encode('ascii', 'ignore')
            print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

Solution

BEFORE

for line in lines[0:-1]:
    print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

NEW SOLUTION

for line in lines[0:-1]:
    # Convert to ASCII while discarding characters that can not be encoded
    line = line.encode('ascii', 'ignore')
    print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

About the bug fix

Now, the RQD will ignore characters non-ascii in the logs and the RQD will work correctly.

For example:

If the log is:
'text here 영화, café'

It will be:
'text here , caf'

It will ignore the Korean character '영화' and the 'é'.

Tests

python

>>> text = 'text here 영화, café'

>>> text.encode('ascii')
Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)

>>> text.encode('ascii', 'ignore')
b'text here , caf'

@ramonfigueiredo
Copy link
Collaborator Author

Other solutions

Replace non-ASCII character with '?'

python

>>> text = 'text here 영화, café'

Current solution

>>> text.encode('ascii', 'ignore')
b'text here , caf'

Solution using .encode('ascii', 'replace')

>>> text.encode('ascii', 'replace')
b'text here ??, caf?'

Encode with UTF-8, instead of ASCII

It is also possible to keep UTF-8, but use encode() and decode():

UTF-8 example

>>> text = 'text here 영화, café'
>>> text_encoded = text.encode('utf-8', 'ignore')
>>> text_encoded.decode()
'text here �\x98\x81�\x99\x94, café'

ASCII example

>>> text = 'text here 영화, café'e')
>>> text_encoded = text.encode('ascii', 'ignore')
>>> text_encoded.decode()
'text here , caf'

- Convert to ASCII while discarding characters that can not be encoded
- Update sphinx version to 5.0.0 on docs/requirements.txt
- Convert to ASCII while discarding characters that can not be encoded
- Update sphinx version to 5.0.0 on docs/requirements.txt
- Change docs/conf.py to use language = 'en'
docs/requirements.txt Outdated Show resolved Hide resolved
@@ -1219,6 +1219,8 @@ def print_and_flush_ln(fd, last_timestamp):

remainder = lines[-1]
for line in lines[0:-1]:
# Convert to ASCII while discarding characters that can not be encoded
line = line.encode('ascii', 'ignore')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramonfigueiredo Thank you for the PR and the detailed write up. The detail makes it nice and easy to review with the proper context.

So, this error occurs when we intercept output in order to prepend a timestamp. What is logged to the file when RQD_PREPEND_TIMESTAMP is False?

My sense is that the output which is logged when RQD_PREPEND_TIMESTAMP is True vs False should be as similar as possible, aside from the timestamp of course.

Removing changes to update Sphinx version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants