Performance is better in 1.6.1 release compared to 1.7.4 release in many models #419

vineethanandh · 2023-09-22T09:11:38Z

System Info

Optimum-habana - 1.7.4
Synapse AI - 1.12.0
Docker - 1.12.0-463
Gaudi2 (HLS 225) - 1x and 8x.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Steps to Reproduce
Writing down the steps to reproduce to run SwinT in 1x

Download and install optimum-habana
git clone https://github.com/huggingface/optimum-habana.git
cd optimum-habana
git chekout v1.6-release
pip install -r examples/image-classification/requirements.txt
pip install optimum-habana==1.6.1
python3 /root//optimum-habana/examples/image-classification/run_image_classification.py --model_name_or_path microsoft/swin-base-patch4-window7-224 --dataset_name cifar10 --output_dir /tmp/swint_hf/results/ --remove_unused_columns False --do_train --learning_rate 2e-05 --per_device_train_batch_size 64 --evaluation_strategy no --save_strategy no --load_best_model_at_end True --save_total_limit 3 --seed 1337 --use_habana --use_lazy_mode --gaudi_config_name Habana/swin --throughput_warmup_steps 3 --ignore_mismatched_sizes --bf16 --num_train_epochs 1 --logging_steps 20 --dataloader_num_workers 8

Expected behavior

The expected behaviour is that
1.12.0-463 having similar perf with optimum-habana 1.6.1 and optimum-habana 1.7.4

But what is observed is that perf is better in optimum-habana 1.6.1 and comparitively lesser in 1.7.4

This is applicable for SwinT, ViT, Bert-Large in 8x and 1x.
Eg: values in SwinT is given below

OH - 1.7.4 values
362.524
362.566
360.719
358.089

OH - 1.6.1 values
389.045
390.971
389.587

Almost 7.5% drop

The text was updated successfully, but these errors were encountered:

regisss · 2023-09-25T07:37:27Z

I'm going to look into it

vineethanandh · 2023-10-09T07:16:03Z

@regisss - Did you get some time to check this behaviour

regisss · 2023-10-09T07:25:47Z

@regisss - Did you get some time to check this behaviour

Not yet. I don't think I'll have time to do it this week and before releasing Optimum Habana v1.8. I'll investigate this by next week and will release a patch if needed.

vineethanandh added the bug Something isn't working label Sep 22, 2023

vineethanandh changed the title ~~Performance is better in 1.6.1 release compared to 1.7.4 release in may models~~ Performance is better in 1.6.1 release compared to 1.7.4 release in many models Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance is better in 1.6.1 release compared to 1.7.4 release in many models #419

Performance is better in 1.6.1 release compared to 1.7.4 release in many models #419

vineethanandh commented Sep 22, 2023

regisss commented Sep 25, 2023

vineethanandh commented Oct 9, 2023

regisss commented Oct 9, 2023

Performance is better in 1.6.1 release compared to 1.7.4 release in many models #419

Performance is better in 1.6.1 release compared to 1.7.4 release in many models #419

Comments

vineethanandh commented Sep 22, 2023

System Info

Information

Tasks

Reproduction

Expected behavior

regisss commented Sep 25, 2023

vineethanandh commented Oct 9, 2023

regisss commented Oct 9, 2023