Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[optim] Noisy benchmarks #1988

Open
janeyx99 opened this issue Oct 15, 2023 · 0 comments
Open

[optim] Noisy benchmarks #1988

janeyx99 opened this issue Oct 15, 2023 · 0 comments

Comments

@janeyx99
Copy link
Contributor

janeyx99 commented Oct 15, 2023

In the past 2-3 weeks, these configs have been bouncing up and down.

DALLE2_pytorch, Adam, cuda, amsgrad, maximize
DALLE2_pytorch, Adam, cuda, default
DALLE2_pytorch, Adam, cuda, foreach
DALLE2_pytorch, Adam, cuda, fused, amsgrad, maximize
DALLE2_pytorch, Adam, cuda, fused, capturable, amsgrad
DALLE2_pytorch, AdamW, cuda, amsgrad, maximize
DALLE2_pytorch, AdamW, cuda, default
DALLE2_pytorch, AdamW, cuda, foreach
DALLE2_pytorch, AdamW, cuda, fused, amsgrad, maximize
DALLE2_pytorch, AdamW, cuda, fused, capturable, amsgrad
DALLE2_pytorch, RAdam, cuda, default
DALLE2_pytorch, RAdam, cuda, foreach
detectron2_maskrcnn, Adam, cuda, differentiable
doctr_det_predictor, Adam, cuda, no_foreach
hf_BigBird, Rprop, cuda, default
hf_Reformer, Adadelta, cuda, differentiable
hf_T5_large, Rprop, cuda, differentiable
mobilenet_v2, RAdam, cuda, no_foreach
mobilenet_v3_large, Rprop, cuda, default
phlippe_densenet, Adamax, cuda, no_foreach
phlippe_densenet, NAdam, cuda, no_foreach
shufflenet_v2_x1_0, Adagrad, cuda, no_foreach
shufflenet_v2_x1_0, NAdam, cuda, no_foreach
speech_transformer, Adadelta, cuda, differentiable
stable_diffusion_unet, RAdam, cuda, differentiable
stable_diffusion_unet, RAdam, cuda, no_foreach
Super_SloMo, RMSprop, cuda, differentiable
timm_efficientnet, Adadelta, cuda, no_foreach
timm_vision_transformer_large, RAdam, cuda, differentiable
timm_vision_transformer_large, RAdam, cuda, no_foreach
tts_angular, Rprop, cuda, maximize

Raw copy paste below:

timm_vision_transformer_large, RAdam, cuda, differentiable: +79.81755%
DALLE2_pytorch, Adam, cuda, foreach: -31.41948%
DALLE2_pytorch, AdamW, cuda, default: -34.05249%
DALLE2_pytorch, AdamW, cuda, default: +34.76710%
DALLE2_pytorch, AdamW, cuda, foreach: -32.74721%
Super_SloMo, RMSprop, cuda, differentiable: -33.06181%
DALLE2_pytorch, Adam, cuda, fused, amsgrad, maximize: +45.06670%
DALLE2_pytorch, Adam, cuda, fused, capturable, amsgrad: +36.11691%
DALLE2_pytorch, AdamW, cuda, foreach: +49.40952%
DALLE2_pytorch, AdamW, cuda, fused, amsgrad, maximize: +38.32376%
timm_vision_transformer_large, RAdam, cuda, no_foreach: -44.00085%
timm_vision_transformer_large, RAdam, cuda, differentiable: -44.24940%
hf_T5_large, Rprop, cuda, differentiable: +30.94747%
Super_SloMo, RMSprop, cuda, differentiable: +52.56252%
hf_Reformer, Adadelta, cuda, differentiable: +32.41173%
timm_efficientnet, Adadelta, cuda, no_foreach: +36.42987%
phlippe_densenet, Adamax, cuda, no_foreach: +37.26340%
phlippe_densenet, NAdam, cuda, no_foreach: +45.02834%
DALLE2_pytorch, Adam, cuda, default: -30.66293%
shufflenet_v2_x1_0, NAdam, cuda, no_foreach: -34.01592%
tts_angular, Rprop, cuda, maximize: -31.97153%
doctr_det_predictor, Adam, cuda, no_foreach: +40.54418%
timm_vision_transformer_large, RAdam, cuda, no_foreach: +76.79410%
timm_vision_transformer_large, RAdam, cuda, differentiable: +79.21604%
tts_angular, Rprop, cuda, maximize: +46.56646%
shufflenet_v2_x1_0, NAdam, cuda, no_foreach: +48.30327%
DALLE2_pytorch, Adam, cuda, default: +35.45820%
timm_vision_transformer_large, RAdam, cuda, no_foreach: -43.40771%
timm_vision_transformer_large, RAdam, cuda, differentiable: -44.11391%
stable_diffusion_unet, RAdam, cuda, no_foreach: -47.72116%
DALLE2_pytorch, Adam, cuda, fused, capturable, amsgrad: +37.85595%
DALLE2_pytorch, AdamW, cuda, foreach: +37.25520%
DALLE2_pytorch, AdamW, cuda, fused, capturable, amsgrad: +36.33095%
DALLE2_pytorch, RAdam, cuda, foreach: +40.75257%
stable_diffusion_unet, RAdam, cuda, no_foreach: +92.67768%
DALLE2_pytorch, Adam, cuda, fused, amsgrad, maximize: +31.76727%
timm_vision_transformer_large, RAdam, cuda, no_foreach: +78.50101%
speech_transformer, Adadelta, cuda, differentiable: -32.16910%
mobilenet_v2, RAdam, cuda, no_foreach: -31.10531%
hf_BigBird, Rprop, cuda, default: -30.36015%
speech_transformer, Adadelta, cuda, differentiable: +47.25762%
mobilenet_v2, RAdam, cuda, no_foreach: +42.18837%
timm_vision_transformer_large, RAdam, cuda, no_foreach: -44.11201%
DALLE2_pytorch, AdamW, cuda, foreach: +32.51113%
mobilenet_v3_large, Rprop, cuda, default: +58.95230%
detectron2_maskrcnn, Adam, cuda, differentiable: +44.64110%
shufflenet_v2_x1_0, Adagrad, cuda, no_foreach: -34.88800%
DALLE2_pytorch, Adam, cuda, foreach: +33.90456%
DALLE2_pytorch, AdamW, cuda, foreach: +38.89779%
DALLE2_pytorch, AdamW, cuda, fused, amsgrad, maximize: +33.92788%
DALLE2_pytorch, RAdam, cuda, foreach: +44.08831%
shufflenet_v2_x1_0, Adagrad, cuda, no_foreach: +56.42979%
stable_diffusion_unet, RAdam, cuda, no_foreach: -47.85513%
stable_diffusion_unet, RAdam, cuda, differentiable: -48.20885%
DALLE2_pytorch, Adam, cuda, amsgrad, maximize: +32.97666%
DALLE2_pytorch, Adam, cuda, fused, amsgrad, maximize: +39.28866%
DALLE2_pytorch, Adam, cuda, fused, capturable, amsgrad: +44.78136%
DALLE2_pytorch, AdamW, cuda, default: +37.12394%
DALLE2_pytorch, AdamW, cuda, amsgrad, maximize: +37.40143%
DALLE2_pytorch, AdamW, cuda, foreach: +37.27161%
DALLE2_pytorch, AdamW, cuda, fused, amsgrad, maximize: +43.04382%
DALLE2_pytorch, AdamW, cuda, fused, capturable, amsgrad: +40.58533%
DALLE2_pytorch, RAdam, cuda, default: +35.76401%
DALLE2_pytorch, RAdam, cuda, foreach: +35.42852%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant