New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Deleting steps up to a given point #7357
Comments
Hi @BramVanroy! Thank you for writing in! I will go ahead and submit the feature request for you to our engineering team. |
Hi @BramVanroy, please find more discussion on the feature backlog here: #7078 (comment) In short, this is on our radar as a high-value feature. We already have one part complete, but have some remaining work implement. |
@sephmard That's looking great! Any chance that this will also be controllable by environment variables? That would make life easier in HPC environments where passing around env vars might be easier than setting vars in scripts. Perhaps WANDB_RESUME can be update to align with this behavior, or a new WANDB_RESUME_RUN and WANDB_RESUME_STEP can be added? |
Description
In HPC environments we often have job allocations (e.g. 10 days). It is possible that a save is triggered on 9 days and 20 hours (e.g. at step 100). After saving, logging just continues until the 10 days are over (e.g. until step 105). Then, we continue with a new job from that latest saved checkpoint at step 100. But that leads to a discrepancy: wandb logging is at step 105 already, but we restart from step 100 - so the graph will be messed up a bit. It would therefore be incredibly useful if some tooling exists to remove data points in a given range of steps.
Suggested Solution
A CLI interface that, given a project name and run as well as a steps-range, allows us to remove all data points within that range. From the outside looking in, this seems straightforward but I'm sure there are technical reasons why this is more difficult to do.
The text was updated successfully, but these errors were encountered: