Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writable doesn't correctly count size of strings #52818

Closed
ronag opened this issue May 3, 2024 · 12 comments · Fixed by #52842
Closed

Writable doesn't correctly count size of strings #52818

ronag opened this issue May 3, 2024 · 12 comments · Fixed by #52842
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@ronag
Copy link
Member

ronag commented May 3, 2024

Minor bug. Not sure if it's worth the performance impact of fixing.

When calculating the currently buffered length we are not correctly calculating the byte length of strings, we just use the string length.

function writeOrBuffer(stream, state, chunk, encoding, callback) {
  const len = (state[kState] & kObjectMode) !== 0 ? 1 : chunk.length;

  state.length += len;
@ronag
Copy link
Member Author

ronag commented May 3, 2024

@nodejs/streams @mcollina

@VoltrexKeyva VoltrexKeyva added the stream Issues and PRs related to the stream subsystem. label May 3, 2024
@lpinca
Copy link
Member

lpinca commented May 3, 2024

I think we should fix it. Is this a recent regression?

@benjamingr
Copy link
Member

So the fix would be to call Buffer.byteLength? Might be worth checking what the perf regression is and if it's not significant to fix it.

@benjamingr
Copy link
Member

@lpinca
Copy link
Member

lpinca commented May 3, 2024

@benjamingr it seems to be correctly calculated in v10 and v0.10. See https://github.com/nodejs/node/blob/v10.x/lib/_stream_writable.js#L374 and https://github.com/nodejs/node/blob/v0.10/lib/_stream_writable.js#L204 (note decodeChunk()).

@lpinca
Copy link
Member

lpinca commented May 3, 2024

It actually seems to be correctly calculated also on main. See

chunk = Buffer.from(chunk, encoding);
.

@lpinca
Copy link
Member

lpinca commented May 3, 2024

Confirmed.

const { Writable } = require('stream');

const w = new Writable({
  write() {}
});

w.write('€');
w.write('€');

console.log(w.writableLength); // 6

I think we can close this.

@mcollina mcollina closed this as completed May 3, 2024
@ronag ronag reopened this May 3, 2024
@benjamingr
Copy link
Member

@lpinca that's just one case though?

@benjamingr
Copy link
Member

To be clear:

When calculating the currently buffered length we are not correctly calculating the byte length of strings, we just use the string length.

This is across streams in several places - the fact writable works with decodeStrings set to true doesn't mean it's not a bug elsewhere?

@lpinca
Copy link
Member

lpinca commented May 3, 2024

The issue description did not mention the decodeStrings option. Yes, in that case the string size is incorrectly calculated.

@lpinca
Copy link
Member

lpinca commented May 3, 2024

I think using Buffer.byteLength() when the decodeStrings is set to false is acceptable. Performance should be no worse than the default case (decodeStrings set to true) when the string is converted to a Buffer.

@benjamingr
Copy link
Member

benjamingr commented May 3, 2024

I think whenever we have a stream of strings and not buffers we should use byteLength (or just use Buffer.byteLength for everything) and if there is no performance impact land it.

(though byteLength would use its length as utf-8 and not utf-16 "as if it was a buffer encoded as utf-8" which may be desired?)

lpinca added a commit to lpinca/node that referenced this issue May 4, 2024
Use the UTF-8 byte length of the string when the `decodeStrings` option
is set to `false`.

Fixes: nodejs#52818
lpinca added a commit to lpinca/node that referenced this issue May 4, 2024
Use the UTF-8 byte length of the string when the `decodeStrings` option
is set to `false`.

Fixes: nodejs#52818
lpinca added a commit to lpinca/node that referenced this issue May 4, 2024
Use the byte length of the string when the `decodeStrings` option is set
to `false`.

Fixes: nodejs#52818
benjamingr added a commit that referenced this issue May 5, 2024
Documents that we calculate the highWaterMark value of streams operating on strings using the number of UTF-16 code units.

Fixes: #52818
benjamingr added a commit that referenced this issue May 6, 2024
Documents that we calculate the highWaterMark value
of streams operating on strings using the number of
UTF-16 code units.

Fixes: #52818
aduh95 pushed a commit that referenced this issue May 10, 2024
Documents that we calculate the highWaterMark value
of streams operating on strings using the number of
UTF-16 code units.

Fixes: #52818
PR-URL: #52842
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
targos pushed a commit that referenced this issue May 11, 2024
Documents that we calculate the highWaterMark value
of streams operating on strings using the number of
UTF-16 code units.

Fixes: #52818
PR-URL: #52842
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
5 participants