Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Fails on Validation Sanity Check #5

Open
madhawav opened this issue Jul 10, 2020 · 1 comment
Open

Training Fails on Validation Sanity Check #5

madhawav opened this issue Jul 10, 2020 · 1 comment

Comments

@madhawav
Copy link
Contributor

Hi,
When I try to train on a new dataset, it fails with the following error.

[PYTHON_ENV_PATH]/neuraltexture/bin/python -u [PROJECT_ROOT]/code/train_neural_texture.py
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Use pytorch 1.4.0
Load config: configs/neural_texture/config_default.yaml
INFO:lightning:GPU available: True, used: True
INFO:lightning:CUDA_VISIBLE_DEVICES: [0]
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:23: RuntimeWarning: You have defined a `val_dataloader()` and have defined a `validation_step()`, you may also want to define `validation_epoch_end()` for accumulating stats.
  warnings.warn(*args, **kwargs)
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:23: RuntimeWarning: You have defined a `test_dataloader()` and have defined a `test_step()`, you may also want to define `test_epoch_end()` for accumulating stats.
  warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "[PROJECT_ROOT]/neuraltexture/code/train_neural_texture.py", line 47, in <module>
    trainer.fit(system)
  File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 765, in fit
    self.single_gpu_train(model)
  File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 492, in single_gpu_train
    self.run_pretrain_routine(model)
  File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 896, in run_pretrain_routine
    eval_results = self._evaluate(model,
  File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 322, in _evaluate
    eval_results = model.validation_end(outputs)
  File "[PROJECT_ROOT]/neuraltexture/code/systems/s_core.py", line 33, in validation_end
    for key in outputs[0].keys():
IndexError: list index out of range

Process finished with exit code 1

Additional Information

  • My dataset: As a sanity check, I use all the test images provided by you as my dataset. Thus, I have a folder called "all" in the "datasets" directory which has two sub-directories "train" and "test". I have copied all the test images provided by you into both of these directories.
  • The working directory is "[PROJECT_ROOT]/code".
  • My Operating System is Ubuntu 16.04.
  • PyTorch Lightning 0.7.5 is installed.

My "config_default.yml" Is shown below:

version_name: neuraltexture_all_2d_single
device: cuda
n_workers: 8
n_gpus: 1
dim: 2
noise:
  octaves: 8
logger:
  log_files_every_n_iter: 1000
  log_scalars_every_n_iter: 100
  log_validation_every_n_epochs: 1
image:
  image_res: &image_res 128 # (height, width)
texture:
  e: &texture_e 64 # encoding size
dataset:
  name: datasets.images
  path: '../datasets/all'
  use_single: -1 # -1 = all, 0,1,2 for single
system:
  block_main:
    model_texture_encoder:
      model_params:
        name: models.neural_texture.encoder
        type: 'ResNet'
        shape_in:  [[3, *image_res, *image_res]]
        bottleneck_size: 8
    model_texture_mlp:
      model_params:
        name: models.neural_texture.mlp
        type: 'MLP'
        n_max_features: 128
        n_blocks: 4
        dropout_ratio: 0.0
        non_linearity: 'relu'
        bias: True
        encoding: *texture_e
    optimizer_params:
      name: 'adam'
      lr: 0.0001
      weight_decay: 0.0001
    scheduler_params:
      name: 'none'
    loss_params:
      style_weight: 1.0
      style_type: 'mse'
train:
  epochs: 3
  bs: 16
  accumulate_grad_batches: 1
  seed: 41127

Your help is much appreciated.

@PierrickCh
Copy link

Had the same issue, tweaked the code a bit to:

if len(outputs)>0:
            for key in outputs[0].keys():
                logs[key] = torch.stack([x[key] for x in outputs]).mean()
        else: 
            logs['val_loss']=torch.tensor(0.)

This is very ad hoc, i think the code needs a 'val' folder as well as a train and test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants