Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about computing the adversarial saliency map in JSMA attack #2306

Open
HIT1180300227 opened this issue Oct 9, 2023 · 3 comments

Comments

@HIT1180300227
Copy link

Hi,

When using JSMA method, I found that the implementation of adversarial saliency map of this toolbox is slightly different from the original paper:

In this toolbox, corresponding implementation in saliency_map.py looks like this:

  def _saliency_map(self, x: np.ndarray, target: Union[np.ndarray, int], search_space: np.ndarray) -> np.ndarray:
  
    grads = self.estimator.class_gradient(x, label=target)
    grads = np.reshape(grads, (-1, self._nb_features))

    # Remove gradients for already used features
    used_features = 1 - search_space
    coeff = 2 * int(self.theta > 0) - 1
    grads[used_features == 1] = -np.inf * coeff

    if self.theta > 0:
        ind = np.argpartition(grads, -2, axis=1)[:, -2:]**
    else:  # pragma: no cover
        ind = np.argpartition(-grads, -2, axis=1)[:, -2:]

    return ind

I notice that ind is selected directly from grads

But in original paper, adversarial saliency map is computed like this :

屏幕截图 2023-10-09 091213

or heuristic equation like this:

屏幕截图 2023-10-09 092527

I'm confused about this difference.

@beat-buesser
Copy link
Collaborator

Hi @HIT1180300227 I think this implementation of JSMA is neglecting the additional terms on gradients towards classes other than the target class. Have you been able to use the attack successfully?

@HIT1180300227
Copy link
Author

Hi @beat-buesser ,

I use JSMA method in ids(intrusion detection system) field.Specifically, I use the targeted JSMA method on the statistical feature vectors as follows:

art_classifier = KerasClassifier(model=model, use_logits=False)
attack = SaliencyMapMethod(classifier=art_classifier, theta=theta, gamma=gamma, batch_size=1,verbose=True)

#x_test are original statistical feature vectors 
targeted_x_test_jsma = attack.generate(x=x_test,y=numpy_targets)

Before using jsma attack,I can get 90% classification accuracy.After using this attack method, the classification accuracy will be reduced to 20%.

It seems that although the implementation of this attack method is not consistent with the original paper, it can still successfully confuse the classification model.

Why does the jsma attack still work?

@beat-buesser
Copy link
Collaborator

beat-buesser commented Nov 1, 2023

Hi @HIT1180300227 I think it still works because the main component of the gradients is the same, e.g. the direction in which the current classes' logit value decreases. The paper is more accurate by requiring additional terms for updates to this direction to make sure the other logins are not increasing. It looks that for many applications these additional therms might be small/negligible, but it would be more complicated to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants