Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGD adversarial training implementation is incorrect #4

Open
carlini opened this issue Feb 26, 2019 · 3 comments
Open

PGD adversarial training implementation is incorrect #4

carlini opened this issue Feb 26, 2019 · 3 comments

Comments

@carlini
Copy link

carlini commented Feb 26, 2019

While the idea of adversarial training is straightforward—-generate adversarial examples during training and train on those examples until the model learns to classify them correctly—-in practice it is difficult to get right. The basic idea has been independently developed at least twice and was the focus of several papers before all of the right ideas were combined by Madry et al. to form the strongest defense to date. There are at least three flaws in the re-implementation of this defense after a cursory analysis:

  • Incorrect loss function. The loss function used in the original paper is only a loss on the adversarial examples whereas this paper mixes adversarial examples and original examples to form the loss function.

  • Incorrect model architectures. In the original paper, the authors make three claims for the novelty of their method. One of these claims states “To reliably withstand strong adversarial attacks, networks require a significantly larger capacity than for correctly classifying benign examples only.” The code that re-implements this defense does not follow this advice and instead uses a substantially smaller model than recommended.

  • Incorrect hyperparameter settings. The original paper trains their MNIST model for 83 epochs of training; In contrast, the paper here trains for only 20 epochs (4x fewer iterations).

Possibly because of these implementation differences, the DeepSec report finds (incorrectly) that a more basic form of adversarial training performs better than PGD adversarial training.

I didn't re-implement any of the other defenses; the fact that I'm not raising other issues is not because there are none, just that I didn't look for any others.

@ryderling
Copy link
Owner

While the idea of adversarial training is straightforward—-generate adversarial examples during training and train on those examples until the model learns to classify them correctly—-in practice it is difficult to get right. The basic idea has been independently developed at least twice and was the focus of several papers before all of the right ideas were combined by Madry et al. to form the strongest defense to date. There are at least three flaws in the re-implementation of this defense after a cursory analysis:

  • Incorrect loss function. The loss function used in the original paper is only a loss on the adversarial examples whereas this paper mixes adversarial examples and original examples to form the loss function.
  • Incorrect model architectures. In the original paper, the authors make three claims for the novelty of their method. One of these claims states “To reliably withstand strong adversarial attacks, networks require a significantly larger capacity than for correctly classifying benign examples only.” The code that re-implements this defense does not follow this advice and instead uses a substantially smaller model than recommended.
  • Incorrect hyperparameter settings. The original paper trains their MNIST model for 83 epochs of training; In contrast, the paper here trains for only 20 epochs (4x fewer iterations).

Possibly because of these implementation differences, the DeepSec report finds (incorrectly) that a more basic form of adversarial training performs better than PGD adversarial training.

I didn't re-implement any of the other defenses; the fact that I'm not raising other issues is not because there are none, just that I didn't look for any others.

To point out first, we indeed carelessly mixed adversarial examples and natural examples in calculating loss function. We will quickly update a new version of this repo and fix this bug. On the other hand, this should not weaken the PGD model because the natural examples are anyway within the ball and they can actually be viewed as useless data augmentation.

As for the model architecture and hyper-parameters in training models, the reason why we do not modify the model architecture and hyper-parameters is we make the defenses and the raw model share the same model architecture and hyper-parameters (epoch, batch size, optimizer) for training. We thought that comparing the effectiveness of different defenses with different model architectures may be unfair. However, in order to take the capacity of defense-enhanced model into account, it is better to design and evaluate another set of experiments in which all adversarial training methods (even for all re-training defenses) share the same model architecture with larger model capacity. We mainly want to show under the same conditions (model architectures/parameters) what the performance of different models compared with each other; rather than tuning each model to their best since we find this comparison could be unfair.

@carlini
Copy link
Author

carlini commented Mar 16, 2019

I am glad you will fix the one error.

However: calling this PGD Adversarial Training is disingenuous when you don't actually follow what the paper proposes. It's literally one of their three listed contributions: "We explore the impact of network architecture on adversarial robustness and find that model capacity plays an important role here. To reliably withstand strong adversarial attacks, networks require a significantly larger capacity than for correctly classifying benign examples only".

Evaluating a weakened version of this defense is fundamentally the wrong thing to do. Reproduction research can only exist if people trust it is done correctly, and can't give the counterargument "well they may have made a mistake and done it differently, so I'm not going to listen to those results." And this gets to the heart of why I'm so adamant about these issues being fixed. A large fraction of what I do is reproduce other research and re-evaluate it, and when papers like this do it incorrectly, it tarnishes the reputation of the field.

@aleksmadry
Copy link

I largely support Nicholas here.

In particular, we found that mixing adversarial and natural examples does lead to models that are less robust (see https://arxiv.org/abs/1805.12152) overall.

While training a fixed architecture for a fixed amount of steps could be a valid way to compare defenses on a fair ground, certain care is needed. One would need to be careful to tune other hyper-parameters (e.g. PGD paremeters) in order to properly implement the defense. Using the parameters tuned in a very different setting (high capacity, longer training) can lead to suboptimal defense performance.

In any case, reaching broad conclusions about the comparative performance of different defenses should not be done in a setting that could cripple one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants