Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayerConductance() get the attribution, the value all equal to zero #1225

Open
BaiMeiyingxue opened this issue Dec 18, 2023 · 1 comment
Open

Comments

@BaiMeiyingxue
Copy link

BaiMeiyingxue commented Dec 18, 2023

❓ Questions and Help

We have a set of listed resources available on the website and FAQ: https://captum.ai/ and https://captum.ai/docs/faq . Feel free to open an issue here on the github or in our discussion forums:

'lc = LayerConductance(custom_forward2, layer)
    layer_attributions = lc.attribute(inputs=input_embeddings, baselines=ref_input_embeddings, additional_forward_args=(input_ids,all_test_audio,attention_mask,token_type_ids))

',
and return layer_attributions like below
the shape of layer attribution:tensor([[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]], device='cuda:0', grad_fn=<SumBackward1>)
what's it mean?

Tasks

No tasks being tracked yet.
@BaiMeiyingxue
Copy link
Author

BaiMeiyingxue commented Dec 20, 2023

Hi, I check the function, and I found it occurred at this function:

def predict2(inputs,input_ids, all_test_audio, input_mask, segment_ids):
    logits, text_att, fusion_att, pooled_output_a, hidden_states, audio_attn, audio_weight, text_weight = model(
        input_ids=input_ids, all_audio_data=all_test_audio, attention_mask=input_mask, token_type_ids=segment_ids,inputs_embeds=inputs)
    return logits, text_att, fusion_att, pooled_output_a, hidden_states, audio_attn, audio_weight, text_weight


def custom_forward2(inputs,input_ids, all_test_audio, input_mask=None, segment_ids=None,):
    logits, _, _, _, _, _, _, _ = predict2(inputs,input_ids, all_test_audio, input_mask, segment_ids)
    # logits = logits.detach().cpu().numpy()
    # logits = torch.argmax(logits, axis=-1)
    return torch.softmax(logits, dim = 1)[0][0].unsqueeze(-1)
        saved_layer, output = _forward_layer_distributed_eval(
            forward_fn,
            inputs,
            layer,
            target_ind=target_ind,
            additional_forward_args=additional_forward_args,
            attribute_to_layer_input=attribute_to_layer_input,
            forward_hook_with_return=True,
            require_layer_grads=True,
        )

intermedia results in saved_layer is not zero , as follows:

the saved_layer in compute_layer_gradients_and_eval defaultdict(<class 'dict'>, {MAG(
  (W_ha): Linear(in_features=256, out_features=128, bias=True)
  (W_a): Linear(in_features=128, out_features=128, bias=True)
  (LayerNorm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.5, inplace=False)
): {device(type='cuda', index=0): (tensor([[[ 78.7412,  73.8596,  73.8594,  ..., 110.7228,  73.8589,  65.9760],
         [ 79.4688,  73.8545,  73.8542,  ..., 110.2769,  73.8539,  67.1542],
         [ 82.7490,  73.8542,  73.8540,  ..., 132.7583,  73.8536,  62.8921],
         ...,
         [ 73.4833,  73.8537,  73.8534,  ...,  76.7181,  73.8530,  73.1434],
         [ 73.4835,  73.8539,  73.8536,  ...,  76.7183,  73.8532,  73.1436],
         [ 73.4835,  73.8539,  73.8537,  ...,  76.7184,  73.8533,  73.1436]],

        [[ 78.7414,  73.8593,  73.8592,  ..., 110.7227,  73.8585,  65.9763],
         [ 79.4685,  73.8537,  73.8536,  ..., 110.2761,  73.8529,  67.1541],
         [ 82.7489,  73.8536,  73.8536,  ..., 132.7577,  73.8529,  62.8921],
         ...,
         [ 73.4833,  73.8531,  73.8531,  ...,  76.7178,  73.8523,  73.1435],
         [ 73.4830,  73.8529,  73.8529,  ...,  76.7176,  73.8521,  73.1432],
         [ 73.4839,  73.8537,  73.8537,  ...,  76.7185,  73.8529,  73.1440]],

        [[ 78.7395,  73.8583,  73.8580,  ..., 110.7207,  73.8575,  65.9752],
         [ 79.4674,  73.8534,  73.8532,  ..., 110.2752,  73.8526,  67.1537],
         [ 82.7476,  73.8532,  73.8530,  ..., 132.7564,  73.8524,  62.8916],
         ...,
         [ 73.4819,  73.8526,  73.8523,  ...,  76.7168,  73.8517,  73.1427],
         [ 73.4820,  73.8526,  73.8524,  ...,  76.7169,  73.8518,  73.1427],
         [ 73.4826,  73.8533,  73.8531,  ...,  76.7176,  73.8525,  73.1434]],

        ...,

        [[ 81.5572, 128.5072, 112.7790,  ..., 134.9548, 109.1424,  93.4030],
         [138.1441, 272.8504, 229.1123,  ..., 266.4277, 214.6286, 188.7362],
         [126.5526, 229.1123, 226.2585,  ..., 289.5139, 188.7424, 158.7619],
         ...,
         [ 87.0023, 174.4372, 157.2018,  ..., 146.8118, 160.7992, 158.1951],
         [108.5435, 214.6286, 188.7424,  ..., 167.3614, 192.2544, 182.7976],
         [102.6403, 205.5971, 184.3762,  ..., 166.0714, 184.3864, 182.7823]],

        [[ 81.5615, 128.5496, 112.7817,  ..., 135.0740, 109.1179,  93.4688],
         [138.1872, 272.9772, 229.1659,  ..., 266.7053, 214.6296, 188.9022],
         [126.5548, 229.1659, 226.2448,  ..., 289.7447, 188.6952, 158.8691],
         ...,
         [ 87.0793, 174.6189, 157.3272,  ..., 146.9802, 160.8787, 158.3711],
         [108.5192, 214.6296, 188.6952,  ..., 167.4441, 192.1724, 182.8588],
         [102.7136, 205.7789, 184.5013,  ..., 166.2566, 184.4481, 182.9819]],

        [[ 81.5608, 128.5636, 112.7730,  ..., 135.1378, 109.1116,  93.5008],
         [138.1961, 273.0161, 229.1708,  ..., 266.8467, 214.6322, 188.9795],
         [126.5371, 229.1708, 226.2046,  ..., 289.8467, 188.6646, 158.9085],
         ...,
         [ 87.1190, 174.7090, 157.3798,  ..., 147.0668, 160.9248, 158.4611],
         [108.5132, 214.6322, 188.6646,  ..., 167.4924, 192.1476, 182.8960],
         [102.7509, 205.8658, 184.5499,  ..., 166.3515, 184.4858, 183.0822]]],
       device='cuda:0', grad_fn=<ReluBackward0>),)}})

and the output is like:

output in compute_layer_gradients_and_eval : tensor([0.0024], device='cuda:0', grad_fn=<UnsqueezeBackward0>
then the value pass into the function,

saved_grads = torch.autograd.grad(torch.unbind(output), grad_inputs)
the value of saved_grads is all zero. I have been stuck in the trouble about one week, could you give me some advice? @NarineK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant