This workflow containerises Facebook Research's PyTorch GAN Zoo. That code is a toolbox for training a selection of generative adversarial networks on popular image datasets.
Once a model has been trained it can be used to generate new images. For example, after training a model on pictures of celebrities a set of 'fake' celebrity pictures can be created.
The container supports CUDA version 11.1 on the host.
To build the singularity container use the build script in this directory.
./build.sh
This script will try to use singularity's fakeroot support if you run as a non-root user. If this is not supported on your system you can run the script as root.
When the script is finished you will find the container (pytorch_GAN_zoo.sif
)
in your current working directory.
The scripts from PyTorch GAN
Zoo can be called with
singularity exec pytorch_GAN_zoo.sif <script_name>
, for example
singularity exec pytorch_GAN_zoo.sif train.py
Any flags or command line arguments can be declared after the script name.
For many scripts, you will need to supply the --nv
flag to singularity so that
the host GPU may be used.
PyTorch GAN zoo natively supports parallelisation across multiple
GPUs. The
devices to use can be selected using the CUDA_VISIBLE_DEVICES
environment
variable. CUDA compatible GPUs are numbered from zero. For example, to use the
first and third CUDA accelerators you would set CUDA_VISIBLE_DEVICES=0,2
To pass this environment variable to singularity the --env-file
flag must be
used as passing environment variables with commas is not supported by the
--env
flag.
echo 'CUDA_VISIBLE_DEVICES=0,1' > env.txt
singularity exec --env-file env.txt pytorch_GAN_zoo.sif ...
The container includes a convenience script for fetching datasets.
Each dataset can be fetched using,
singularity exec pytorch_GAN_zoo.sif get_data <dataset>
<dataset> |
description |
---|---|
dtd |
5,640 texture images in 47 categories |
cifar10 |
60,000 images of objects in 10 classes |
Both datasets can be fetch with the following commands,
singularity exec pytorch_GAN_zoo.sif get_data dtd
singularity exec pytorch_GAN_zoo.sif get_data cifar10
CelebA is a dataset of more than 200,000 images of celebrities. Downloading this dataset is more difficult to automate. The dataset can be downloaded using a browser here.
Here are examples showing how to use this container to train a PGAN model using the CelebA, DTD and CIFAR-10 datasets and visualise the results.
You can also use the DCGAN model by
passing -m DCGAN
to datasets.py
, train.py
and eval.py
. If you do not
specify the model with -m
PGAN will be used.
Some of the datasets require preprocessing before they can be used for training.
The commands in this section assume the datasets are located in directories
named as the get_data
commands above would do, with the exception of CelebA.
The CelebA dataset requires some preprocessing to crop and orientate the images.
Extract the dataset,
unzip img_align_celeba.zip
Use the datasets.py
script to preprocess the images,
singularity exec pytorch_GAN_zoo.sif datasets.py celeba_cropped <path_to_celeba>/img_align_celeba/ -o celeba_cropped
This command will save the modified dataset in a directory called
celeba_cropped
and create a training configuration file config_celeba_cropped.json
.
The DTD dataset requires no preprocessing, so the datasets script simply creates
a configuration file, config_dtd.json
,
singularity exec pytorch_GAN_zoo.sif datasets.py dtd dtd/images
When training a model with the CIFAR-10 dataset some preprocessing is required.
singularity exec pytorch_GAN_zoo.sif datasets.py cifar10 cifar-10-batches-py -o cifar10
A processed dataset will be written to a directory called cifar-10
and a
configuration file named config_cifar10.json
will be written.
Here are examples of training PGAN models using the three datasets as processed and configured above.
Note that training these models takes approximately six days on a single Nvidia V100.
In each example the --restart
flag is used so that checkpoints are
periodically written during the training. The --no_vis
flag stops the training
script from trying to send information to a
visdom server.
These examples assume that the configuration files are named as those created above.
singularity exec --nv pytorch_GAN_zoo.sif train.py PGAN -c config_celeba_cropped.json --restart --no_vis -n celeba_cropped
singularity exec --nv pytorch_GAN_zoo.sif train.py PGAN -c config_dtd.json --restart --no_vis -n dtd
singularity exec --nv pytorch_GAN_zoo.sif train.py -c config_cifar10.json --restart --no_vis -n cifar10
Each of these examples will write checkpoint and final weights to
output_networks/<model_name>
where <model_name>
is the name you declare
using the -n
flag.
Using a trained model, a set of sample images can be generated using the
eval.py
script.
The syntax for this is,
singularity exec --nv pytorch_GAN_zoo.sif eval.py visualization --np_vis -d output_networks -n <model_name> -m PGAN --save_dataset ./<output_directory> --size_dataset <data_set_size>
<model_name>
is the same value as you used when training. <data_set_size>
specifies the number of images to generate. The images will be saved in the
<output_directory>
directory.
For example, to generate 1000 images of fake celebrities using a model trained as above,
singularity exec --nv pytorch_GAN_zoo.sif eval.py visualization --np_vis -d output_networks -n celeba_cropped -m PGAN --save_dataset ./fake_celebs --size_dataset 1000
For data sets with categories, such as DTD and CIFAR-10, images can be generated
for a particular category. To see the available categories use the
--showLabels
flag. For example with CIFAR-10,
$ singularity exec --nv pytorch_GAN_zoo.sif eval.py visualization --np_vis -d output_networks -n cifar10 -m PGAN --showLabels
...
--Main MAIN ['automobile', 'bird', 'truck', 'airplane', 'cat',
'horse', 'ship', 'frog', 'deer', 'dog']
...
A set of generated 'frog' images can then be saved by using the category flag
--Main
and the label frog
,
singularity exec --nv pytorch_GAN_zoo.sif eval.py visualization --np_vis -d output_networks -n cifar10 -m PGAN --Main frog --save_dataset ./frogs --size_dataset 100
The batch_scripts
directory contains template Slurm batch
scripts for training models on the CelebA,
CIFAR-10 and DTD
datasets.
These templates assume that data directories and configuration files are named as those created above. They demonstrate the advice for running on HPC explained here. This includes using scratch space, parametrising output file names and supporting job arrays.
To submit a job, complete a template filling in placeholders (beginning with
%
) with values appropriate for the platform you are using. Use sbatch
to
submit a job. For example
sbatch train_celeba.sh
Or as a job array
sbatch --array=1-5%2 train_celeba.sh