Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loader bottlenecking training #51

Open
JakobLindscheid opened this issue Apr 18, 2024 · 3 comments
Open

Data loader bottlenecking training #51

JakobLindscheid opened this issue Apr 18, 2024 · 3 comments

Comments

@JakobLindscheid
Copy link

Hi,
Thank you for publishing the pretraining and finetuning scripts! They are really helpful.
For a university project, we are trying to reproduce the results from the paper. However, running the pretrain script, we observe very slow training speeds (~1 minute per epoch) on our hardware.
Running the pytorch profiler for 16 training batches, we see the following:

FIT Profiler Report (relevant lines)
Action Mean duration (s) Num calls Total time (s) Percentage %
Total - 1397 99.734 100 %
run_training_epoch 91.657 1 91.657 91.901
[_TrainingEpochLoop].train_dataloader_next 5.2931 16 84.689 84.915
[_EvaluationLoop].val_next 0.246 19 4.674 4.6865
[LightningModule]LagLlamaLightningModule.optimizer_step 0.10731 16 1.717 1.7216
run_training_batch 0.10731 16 1.717 1.7216
[Strategy]SingleDeviceStrategy.training_step 0.091875 16 1.47 1.4739
[Strategy]SingleDeviceStrategy.validation_step 0.044368 19 0.843 0.84525
[Strategy]SingleDeviceStrategy.backward 0.0135 16 0.216 0.21658
[Callback]ModelCheckpoint{'monitor': None, 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}.on_train_epoch_end 0.141 1 0.141 0.14138
[Callback]ModelCheckpoint{'monitor': 'val_loss', 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}.on_train_epoch_end 0.093 1 0.093 0.093248
[LightningModule]LagLlamaLightningModule.transfer_batch_to_device 0.0022286 35 0.078 0.078208
[Strategy]SingleDeviceStrategy.batch_to_device 0.0022286 35 0.078 0.078208
[LightningModule]LagLlamaLightningModule.on_validation_model_train 0.008 2 0.016 0.016043
[Callback]ModelSummary.on_fit_start 0.015 1 0.015 0.01504
[Callback]TQDMProgressBar.on_validation_batch_end 0.00078947 19 0.015 0.01504
[LightningModule]LagLlamaLightningModule.optimizer_zero_grad 0.0009375 16 0.015 0.01504

Apparently the data loader needs 5 seconds for each batch, which is 84% of the full time of the training step.
After some further investigation, we found that the train data loader does the following:

  1. Apply the transformation to a full time series.
  2. Sample a window from the transformed data (inside the InstanceSplitter).
  3. Extract the window from the transformed data (InstanceSplitter).
  4. Create the batches of data according to the batch size.

This means a full timeseries gets transformed and then most of the transformed data is not used. This is then done for each item in a batch. We observed ~10 ms for transforming a full timeseries and with a batch size of 512, we get the >5 seconds reported by the profiler.

The order of execution is partly given by the gluonts package. So I am not aware of an obvious solution without addressing it there.

Now my question. Did you face the same issue during your experiments? How can we solve the problem we observe?

@ashok-arjun
Copy link
Contributor

Hi @JakobLindscheid !

Thanks for the detailed issue!

I was not aware of this issue as I never checked the data loading speed in my experiments.

Can I check this on my end and get back to you soon?

@JakobLindscheid
Copy link
Author

Sure, thank you for having a look!
For now, I added a data = list(data) before the instance splitter is applied. This forces the transformation to be done before training starts. Obviously it's not the nicest solution ever since it takes a few minutes before training starts, but the total training time is improved a lot.

@ashok-arjun
Copy link
Contributor

That's useful to know, thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants