stream: use Buffer.byteLength() to get the string length #52828

lpinca · 2024-05-04T05:49:02Z

Use the byte length of the string when the decodeStrings option is set to false.

nodejs-github-bot · 2024-05-04T05:49:07Z

Review requested:

@nodejs/streams

lpinca · 2024-05-04T05:51:02Z

lib/internal/streams/writable.js

 encoding = 'buffer';
+ } else {
+ length = Buffer.byteLength(chunk);


Should we use the encoding argument instead of forcing UTF-8?

Use the encoding argument

targos · 2024-05-04T06:14:35Z

What happens if the chunk is in the middle of a multi byte string?

lpinca · 2024-05-04T06:26:11Z

I think the same that happens when the chunk is converted to a Buffer.

nodejs-github-bot · 2024-05-04T12:48:32Z

CI: https://ci.nodejs.org/job/node-test-pull-request/58923/

benjamingr · 2024-05-04T13:28:42Z

Benchmark CI https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1558/

                                                                                         confidence improvement accuracy (*)   (**)  (***)
streams/creation.js kind='duplex' n=50000000                                                             0.38 %       ±0.75% ±1.00% ±1.31%
streams/creation.js kind='readable' n=50000000                                                           0.29 %       ±0.63% ±0.83% ±1.09%
streams/creation.js kind='transform' n=50000000                                                          0.42 %       ±1.88% ±2.50% ±3.25%
streams/creation.js kind='writable' n=50000000                                                           0.05 %       ±0.64% ±0.85% ±1.11%
streams/destroy.js kind='duplex' n=1000000                                                              -0.07 %       ±0.50% ±0.67% ±0.87%
streams/destroy.js kind='readable' n=1000000                                                            -0.97 %       ±3.47% ±4.61% ±6.00%
streams/destroy.js kind='transform' n=1000000                                                            0.28 %       ±0.61% ±0.81% ±1.06%
streams/destroy.js kind='writable' n=1000000                                                            -0.44 %       ±0.66% ±0.87% ±1.14%
streams/pipe-object-mode.js n=5000000                                                                    0.31 %       ±0.33% ±0.44% ±0.58%
streams/pipe.js n=5000000                                                                                0.33 %       ±0.40% ±0.53% ±0.70%
streams/readable-async-iterator.js sync='no' n=100000                                                   -0.85 %       ±1.11% ±1.48% ±1.93%
streams/readable-async-iterator.js sync='yes' n=100000                                                   0.09 %       ±0.91% ±1.21% ±1.58%
streams/readable-bigread.js n=1000                                                                       0.21 %       ±0.75% ±1.00% ±1.31%
streams/readable-bigunevenread.js n=1000                                                                -0.30 %       ±0.62% ±0.83% ±1.08%
streams/readable-boundaryread.js type='buffer' n=2000                                                   -0.06 %       ±0.67% ±0.89% ±1.16%
streams/readable-boundaryread.js type='string' n=2000                                                    0.18 %       ±1.00% ±1.33% ±1.74%
streams/readable-from.js type='array' n=10000000                                                         0.03 %       ±1.12% ±1.49% ±1.95%
streams/readable-from.js type='async-generator' n=10000000                                               0.10 %       ±0.58% ±0.77% ±1.00%
streams/readable-from.js type='sync-generator-with-async-values' n=10000000                             -0.25 %       ±0.34% ±0.46% ±0.60%
streams/readable-from.js type='sync-generator-with-sync-values' n=10000000                               0.05 %       ±0.15% ±0.20% ±0.27%
streams/readable-readall.js n=5000                                                                       0.39 %       ±2.00% ±2.66% ±3.46%
streams/readable-uint8array.js kind='encoding' n=1000000                                                -0.39 %       ±0.79% ±1.05% ±1.37%
streams/readable-uint8array.js kind='read' n=1000000                                                     0.89 %       ±1.28% ±1.70% ±2.22%
streams/readable-unevenread.js n=1000                                                             *     -1.33 %       ±1.08% ±1.44% ±1.88%
streams/writable-manywrites.js len=1024 callback='no' writev='no' sync='no' n=100000              *      2.98 %       ±2.44% ±3.26% ±4.27%
streams/writable-manywrites.js len=1024 callback='no' writev='no' sync='yes' n=100000           ***     -2.78 %       ±0.79% ±1.05% ±1.37%
streams/writable-manywrites.js len=1024 callback='no' writev='yes' sync='no' n=100000                    0.10 %       ±1.21% ±1.62% ±2.14%
streams/writable-manywrites.js len=1024 callback='no' writev='yes' sync='yes' n=100000          ***     -3.05 %       ±1.45% ±1.95% ±2.57%
streams/writable-manywrites.js len=1024 callback='yes' writev='no' sync='no' n=100000                    0.61 %       ±1.84% ±2.45% ±3.19%
streams/writable-manywrites.js len=1024 callback='yes' writev='no' sync='yes' n=100000          ***     -2.07 %       ±0.53% ±0.71% ±0.93%
streams/writable-manywrites.js len=1024 callback='yes' writev='yes' sync='no' n=100000                  -0.53 %       ±0.83% ±1.12% ±1.48%
streams/writable-manywrites.js len=1024 callback='yes' writev='yes' sync='yes' n=100000          **     -1.83 %       ±1.19% ±1.58% ±2.06%
streams/writable-manywrites.js len=32768 callback='no' writev='no' sync='no' n=100000                   -1.59 %       ±2.53% ±3.37% ±4.38%
streams/writable-manywrites.js len=32768 callback='no' writev='no' sync='yes' n=100000          ***     -2.08 %       ±0.98% ±1.31% ±1.70%
streams/writable-manywrites.js len=32768 callback='no' writev='yes' sync='no' n=100000                  -0.17 %       ±1.51% ±2.01% ±2.62%
streams/writable-manywrites.js len=32768 callback='no' writev='yes' sync='yes' n=100000           *     -1.14 %       ±0.94% ±1.26% ±1.64%
streams/writable-manywrites.js len=32768 callback='yes' writev='no' sync='no' n=100000                  -1.19 %       ±2.11% ±2.82% ±3.70%
streams/writable-manywrites.js len=32768 callback='yes' writev='no' sync='yes' n=100000         ***     -1.80 %       ±0.50% ±0.66% ±0.86%
streams/writable-manywrites.js len=32768 callback='yes' writev='yes' sync='no' n=100000                  1.03 %       ±2.10% ±2.80% ±3.66%
streams/writable-manywrites.js len=32768 callback='yes' writev='yes' sync='yes' n=100000         **     -1.21 %       ±0.82% ±1.10% ±1.44%
streams/writable-uint8array.js kind='object-mode' n=50000000                                    ***     -1.84 %       ±0.72% ±0.97% ±1.29%
streams/writable-uint8array.js kind='write' n=50000000                                                   0.23 %       ±0.79% ±1.06% ±1.40%
streams/writable-uint8array.js kind='writev' n=50000000                                           *     -0.63 %       ±0.49% ±0.65% ±0.84%

benjamingr

We need the same fix for Readable

benjamingr · 2024-05-04T13:31:09Z

lib/internal/streams/writable.js

@@ -463,12 +464,17 @@ function _write(stream, chunk, encoding, cb) {
 if (typeof chunk === 'string') {
 if ((state[kState] & kDecodeStrings) !== 0) {
 chunk = Buffer.from(chunk, encoding);
+ length = chunk.length;


since Buffer.byteLength already handles non-strings, it may be simpler to just always call it if it doesn't cause a performance regression

Use the byte length of the string when the `decodeStrings` option is set to `false`. Fixes: nodejs#52818

benjamingr

Reapproving to indicate the 2% regression is fine for correctness IMO. This can also be optimized.

nodejs-github-bot · 2024-05-04T18:10:16Z

CI: https://ci.nodejs.org/job/node-test-pull-request/58929/

lpinca · 2024-05-04T19:22:34Z

We need the same fix for Readable.

I looked into it briefly and it does not seem trivial as it breaks cases like this

const assert = require('assert');
const { Readable } = require('stream');

const readable = new Readable({
  read() {}
});

readable.setEncoding('utf8');
readable.push('€');

const data = readable.read(1);

assert.strictEqual(data, '€');
assert.strictEqual(readable.readableLength, 0);

I'm fine with closing this PR and documenting the current behavior if we want to keep consistency with Readable.

mcollina · 2024-05-04T22:43:22Z

I think consistency is better, and having a fix for Readable would be preferable.

nodejs-github-bot · 2024-05-05T04:51:38Z

CI: https://ci.nodejs.org/job/node-test-pull-request/58937/

benjamingr · 2024-05-05T10:08:46Z

I looked into it briefly and it does not seem trivial as it breaks cases like this

That's actually pretty significant breakage. I tend to think it's better to document that buffering length in strings is based on the string length and not byte size.

lpinca · 2024-05-05T10:28:06Z

That's actually pretty significant breakage. I tend to think it's better to document that buffering length in strings is based on the string length and not byte size.

I agree. Fixing it on Redable is not easy as readable.read(n) works on code units and not bytes when the chunk is a string.

benjamingr

Per discussion, we should document this (on both readable/writable) rather than fix it because it's a breaking change for .read

nodejs-github-bot added the needs-ci PRs that need a full CI run. label May 4, 2024

lpinca commented May 4, 2024

View reviewed changes

lpinca added the stream Issues and PRs related to the stream subsystem. label May 4, 2024

lpinca force-pushed the fix/issue-52818 branch from fb06325 to 38abab3 Compare May 4, 2024 05:55

lpinca changed the title ~~stream: use Buffer.ByteLength() to get the string length~~ stream: use Buffer.byteLength() to get the string length May 4, 2024

ronag approved these changes May 4, 2024

View reviewed changes

lpinca added the request-ci Add this label to start a Jenkins CI on a PR. label May 4, 2024

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label May 4, 2024

benjamingr added the needs-benchmark-ci PR that need a benchmark CI run. label May 4, 2024

benjamingr approved these changes May 4, 2024

View reviewed changes

benjamingr reviewed May 4, 2024

View reviewed changes

stream: use Buffer.byteLength() to get the string length

213539e

Use the byte length of the string when the `decodeStrings` option is set to `false`. Fixes: nodejs#52818

lpinca force-pushed the fix/issue-52818 branch from e153605 to 213539e Compare May 4, 2024 15:04

jasnell approved these changes May 4, 2024

View reviewed changes

benjamingr approved these changes May 4, 2024

View reviewed changes

lpinca added the request-ci Add this label to start a Jenkins CI on a PR. label May 4, 2024

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label May 4, 2024

benjamingr requested changes May 5, 2024

View reviewed changes

lpinca closed this May 5, 2024

lpinca deleted the fix/issue-52818 branch May 5, 2024 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream: use Buffer.byteLength() to get the string length #52828

stream: use Buffer.byteLength() to get the string length #52828

lpinca commented May 4, 2024 •

edited

nodejs-github-bot commented May 4, 2024

lpinca May 4, 2024

ronag May 4, 2024

targos commented May 4, 2024

lpinca commented May 4, 2024 •

edited

nodejs-github-bot commented May 4, 2024

benjamingr commented May 4, 2024 •

edited by lpinca

benjamingr left a comment

benjamingr May 4, 2024

benjamingr left a comment

nodejs-github-bot commented May 4, 2024

lpinca commented May 4, 2024 •

edited

mcollina commented May 4, 2024

nodejs-github-bot commented May 5, 2024

benjamingr commented May 5, 2024

lpinca commented May 5, 2024

benjamingr left a comment

stream: use Buffer.byteLength() to get the string length #52828

stream: use Buffer.byteLength() to get the string length #52828

Conversation

lpinca commented May 4, 2024 • edited

nodejs-github-bot commented May 4, 2024

lpinca May 4, 2024

Choose a reason for hiding this comment

ronag May 4, 2024

Choose a reason for hiding this comment

targos commented May 4, 2024

lpinca commented May 4, 2024 • edited

nodejs-github-bot commented May 4, 2024

benjamingr commented May 4, 2024 • edited by lpinca

benjamingr left a comment

Choose a reason for hiding this comment

benjamingr May 4, 2024

Choose a reason for hiding this comment

benjamingr left a comment

Choose a reason for hiding this comment

nodejs-github-bot commented May 4, 2024

lpinca commented May 4, 2024 • edited

mcollina commented May 4, 2024

nodejs-github-bot commented May 5, 2024

benjamingr commented May 5, 2024

lpinca commented May 5, 2024

benjamingr left a comment

Choose a reason for hiding this comment

lpinca commented May 4, 2024 •

edited

lpinca commented May 4, 2024 •

edited

benjamingr commented May 4, 2024 •

edited by lpinca

lpinca commented May 4, 2024 •

edited