New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream: use Buffer.byteLength() to get the string length #52828
Conversation
Review requested:
|
lib/internal/streams/writable.js
Outdated
encoding = 'buffer'; | ||
} else { | ||
length = Buffer.byteLength(chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the encoding
argument instead of forcing UTF-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the encoding argument
What happens if the chunk is in the middle of a multi byte string? |
I think the same that happens when the chunk is converted to a |
Benchmark CI https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1558/
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the same fix for Readable
@@ -463,12 +464,17 @@ function _write(stream, chunk, encoding, cb) { | |||
if (typeof chunk === 'string') { | |||
if ((state[kState] & kDecodeStrings) !== 0) { | |||
chunk = Buffer.from(chunk, encoding); | |||
length = chunk.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since Buffer.byteLength already handles non-strings, it may be simpler to just always call it if it doesn't cause a performance regression
Use the byte length of the string when the `decodeStrings` option is set to `false`. Fixes: nodejs#52818
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reapproving to indicate the 2% regression is fine for correctness IMO. This can also be optimized.
I looked into it briefly and it does not seem trivial as it breaks cases like this const assert = require('assert');
const { Readable } = require('stream');
const readable = new Readable({
read() {}
});
readable.setEncoding('utf8');
readable.push('€');
const data = readable.read(1);
assert.strictEqual(data, '€');
assert.strictEqual(readable.readableLength, 0); I'm fine with closing this PR and documenting the current behavior if we want to keep consistency with |
I think consistency is better, and having a fix for Readable would be preferable. |
That's actually pretty significant breakage. I tend to think it's better to document that buffering length in strings is based on the string length and not byte size. |
I agree. Fixing it on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion, we should document this (on both readable/writable) rather than fix it because it's a breaking change for .read
Use the byte length of the string when the
decodeStrings
option is set tofalse
.Fixes: #52818