Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in createReadStream when destroy() is called on resulting stream and validation is set to true #2010

Closed
jpambrun opened this issue Jul 29, 2022 · 6 comments
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@jpambrun
Copy link

Memory leak in createReadStream when destroy() is called on resulting stream and validation is set to true. This could occur when 1) the required data could be before the end of the file and reading to completion is not required, or 2) when a downstream client is disconnected or has navigated away.

Environment details

  • OS: Linux
  • Node.js version: v18.6.0
  • npm version: 8.14.0
  • @google-cloud/storage version: 6.2.3

Steps to reproduce

  1. Start GCS simulator using provided docker compose. (this also occurs in real GCS, simulator is just for convenience)
  2. Run supplied script with validation: true
  3. Notice increasing memory usage, up to 700mb+
  4. Re-run supplied script with validation: false
  5. Notice constant low memory usage.

docker compose setup

version: "3.9"  # optional since v1.27.0
services:
  google-cloud-storage-http2:
    image: caddy:2.4.5
    ports:
      - "4443:443"
    entrypoint: ["sh", "-c", 'echo "$$CADDY_CONFIG" > Caddyfile && cat Caddyfile && caddy run']
    environment:
      CADDY_CONFIG: >
          {
            auto_https disable_redirects
            local_certs
          }
          localhost {
            log
            reverse_proxy {
                    to http://google-cloud-storage:80
                    header_down Location (http://\[::\]:80)(.*) https://localhost:4443$$2
            }
          }
  google-cloud-storage:
    image: fsouza/fake-gcs-server:latest
    entrypoint: ["sh", "-c", "mkdir -p /data && mkdir -p /preload/vision-poc && touch /preload/vision-poc/.keep && /bin/fake-gcs-server -scheme http -port 80 -data /preload -filesystem-root /data"]
    ports:
      - "4080:80"

script

import { Storage } from "@google-cloud/storage";
import stream from "stream";
import crypto from "crypto";
import { promisify } from "util";
import pump from "pump";
import process from "process";

const pPump = promisify(pump);
import pLimit from "p-limit";

process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = 0;

const buf = crypto.randomBytes(1024 * 1024 * 2);
const bucketName = "vision-poc";
const fileName = "file.txt";
const storage = new Storage({ apiEndpoint: "https://localhost:4443", projectId: "fake-project-id" });

const myBucket = storage.bucket(bucketName);
const file = myBucket.file(fileName);

async function upload() {
  const passthroughStream = new stream.PassThrough();
  passthroughStream.write(buf);
  passthroughStream.end();
  await pPump(passthroughStream, file.createWriteStream());
}

function download() {
  return new Promise(async (resolve, reject) => {
    const res = await file.createReadStream({ validation: true });
    const chunks = [];
    res.on("data", (chunk) => {
      chunks.push(chunk);
      const bytesSoFar = chunks.reduce((acc, cur) => acc + cur.length, 0);

      if (bytesSoFar > 1024 * 1024 * 1) {
        chunks.length = 0
        res.destroy(); // <--- HERE I don't need more data, or downstream client is gone
        console.log(process.memoryUsage().rss / 1024 / 1024);
        resolve();
      }
    });
    res.on("error", (err) => {
      console.log(err);
      reject(err);
    });
    res.on("end", () => {
      console.log("end"); // not reached
    });
  });
}

// main
(async () => {
  await upload().catch(console.error);

  const limit = pLimit(100);

  const promises = [];
  for (let i = 0; i < 2000; i++) {
    const promise = limit(download);
    promises.push(promise);
  }

  await Promise.all(promises).catch(console.error);

  setInterval(() => {
    console.log(process.memoryUsage().rss / 1024 / 1024);
  }, 1000);
})();
@jpambrun jpambrun added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jul 29, 2022
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Jul 29, 2022
@ddelgrosso1 ddelgrosso1 self-assigned this Aug 1, 2022
@ddelgrosso1
Copy link
Contributor

Thank you for reporting this @jpambrun. I will investigate and update the issue accordingly.

@ddelgrosso1
Copy link
Contributor

Hi @jpambrun. Thank you for the code above. I have attempted to reproduce this in my local environment utilizing calls to GCS as opposed to the emulator. I wanted to ask how much of a difference you are seeing in your environment with validation = true vs validation = false? I ran a few profiling / tracing runs and the results did not immediately point to any glaring issues in the validation code. I am curious if you saw anything else that might be useful in digging in deeper.

@jpambrun
Copy link
Author

jpambrun commented Aug 3, 2022

The area of interest is here:
https://github.com/googleapis/nodejs-storage/blob/main/src/file.ts#L1473-L1595

I did spent a bit several hours digging and could not really pinpoint the issue. It may be related to that pipe. Elsewhere the library uses pump whos main goal is to solve "When using standard source.pipe(dest) source will not be destroyed if dest emits close or an error."

This GCS client is extremely complicated.. trying to go from the bare node http response to this returned throughStream involves a lot of passThrough streams, any one of which could be wrongly connected.

All I can really say is that calling destroy() on the returned throughStream (the dest) should bubble up all the way to the http response (the source) so it can be properly closed and resource can be freed.

@danielbankhead
Copy link
Member

Hey @jpambrun, we've recently refactored our streams in a recent release (v6.4.1):

In short, we've removed pumpify (and the associated .pipe) & started using the native Node.js pipeline. Are you still experiencing leaks in recent releases?

@shaffeeullah shaffeeullah added the needs more info This issue needs more information from the customer to proceed. label Sep 8, 2022
@shaffeeullah
Copy link
Contributor

Hi @jpambrun ,

Please reopen with the requested information if you're still seeing this issue.

@ddelgrosso1
Copy link
Contributor

@jpambrun I know this issue is old and closed for over a year but please see: #2313. I think this may have been the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants