New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Initial RAM Usage Leads to Crashes #338
Comments
Interesting. I think that's due to how the parquet file is processed
(reader file)
That's probably easy enough to fix
…On Wed, Aug 9, 2023, 16:49 Sypherd ***@***.***> wrote:
Here's another sample from a crashed c6i.4xlarge instance where we can
see available process memory approach 0 before crashing:
[image: image]
<https://user-images.githubusercontent.com/50557586/259449237-e6467005-19a0-4748-a96d-2b00bac37eef.png>
Maybe the cause of the crashes is something else but I have not been able
to run img2dataset on a c6i.4xlarge instance yet.
—
Reply to this email directly, view it on GitHub
<#338 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437QQO7PCJDUT5I6GXRDXUOPQ3ANCNFSM6AAAAAA3KDL3WA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've been downloading select URLs from LAION-400M, -5B, and SBU and have noticed that there is a significant spike in RAM usage on startup that causes instances with <=32GB RAM, such as AWS'
c6i.4xlarge
, to crash. Whileimg2dataset
is running, however, RAM usage remains very low. I'd love if we could somehow mitigate that initial spike to be able to use instances with lower RAM throughout. Here's a screenshot from wandb.ai showing the initial spike on a 64GB instance:The text was updated successfully, but these errors were encountered: