Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multus cni ignores errors raised on CNI DEL #1239

Open
nayihz opened this issue Mar 11, 2024 · 5 comments
Open

multus cni ignores errors raised on CNI DEL #1239

nayihz opened this issue Mar 11, 2024 · 5 comments

Comments

@nayihz
Copy link

nayihz commented Mar 11, 2024

What happend:
In #1084, this pr wil ignore the common errors raised by CNI. I don't think this is what we expected.
For example, if a custom cni cmdDel return an error, multus-shim will aslo only log this error but ignore it. So containerd believe it's a successful deletion while we can see TearDown network for sandbox xxx successfully even if it failed to do cmdDel actually.

Here are some containerd logs:

Jan 12 11:07:26 node containerd[584]: 2024-01-12T11:07:26+08:00 [error] CmdCheck (shim): CNI request failed with status 400: '&{ContainerID:b8633013d903140cb60bc179d7869326595628df9ff261b5733ad4bff019307f Netns:/var/run/netns/cni-600651ab-c6f4-a3f2-232d-0f5396dd31d2 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=nginx-6749dd5b99-779g8;K8S_POD_INFRA_CONTAINER_ID=b8633013d903140cb60bc179d7869326595628df9ff261b5733ad4bff019307f;K8S_POD_UID=5bae17b3-ddbb-4ec6-8adb-30fca2bcfefa Path: StdinData:[123 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 48 53 45 99 105 108 105 117 109 46 99 111 110 102 108 105 115 116 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"b8633013d903140cb60bc179d7869326595628df9ff261b5733ad4bff019307f" Netns:"/var/run/netns/cni-600651ab-c6f4-a3f2-232d-0f5396dd31d2" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=nginx-6749dd5b99-779g8;K8S_POD_INFRA_CONTAINER_ID=b8633013d903140cb60bc179d7869326595628df9ff261b5733ad4bff019307f;K8S_POD_UID=5bae17b3-ddbb-4ec6-8adb-30fca2bcfefa" Path:"" ERRORED: DelegateDel: error invoking DelegateDel - "xxxxxx": error in getting result from DelNetwork: netplugin failed but error parsing its diagnostic message "test failed##############\n": invalid character 'e' in literal true (expecting 'r')
Jan 12 11:07:26 node containerd[584]: '
Jan 12 11:07:26 node containerd[584]: time="2024-01-12T11:07:26.819150976+08:00" level=info msg="TearDown network for sandbox \"b8633013d903140cb60bc179d7869326595628df9ff261b5733ad4bff019307f\" successfully"

What you expected to happen:
multus should wrap the error raised by CNI, so kubelet could know that to prevent the pod to be deleted.
How to reproduce it (as minimally and precisely as possible):
write a fake CNI to mock cmdDel always return an error.
Anything else we need to know?:

Environment:

  • Multus version
    image path and image ID (from 'docker images')
  • Kubernetes version (use kubectl version):
  • Primary CNI for Kubernetes cluster:
  • OS (e.g. from /etc/os-release):
  • File of '/etc/cni/net.d/'
  • File of '/etc/cni/multus/net.d'
  • NetworkAttachment info (use kubectl get net-attach-def -o yaml)
  • Target pod yaml info (with annotation, use kubectl get pod <podname> -o yaml)
  • Other log outputs (if you use multus logging)
@dougbtv
Copy link
Member

dougbtv commented Mar 12, 2024

Mostly -- I think during cmdDel we want to ignore errors so that the pod can be deleted by the API. Otherwise, we retry on delete, and then the pod can hang around and crashloop. So, with CNI mentality in mind, we're very sensitive about successes on ADD but on DEL, we're very lenient with letting CNI DELs fail so that pods aren't stuck in a crashloop

@nayihz
Copy link
Author

nayihz commented Mar 12, 2024

we're very lenient with letting CNI DELs fail so that pods aren't stuck in a crashloop

But it will lead to resource leak(such as ip leak) if cmdDel failed. Kubelet could retry to delete sandbox if it knows cmdDel failed.

@dougbtv dougbtv changed the title multus cni ignores errors raised by CNI multus cni ignores errors raised on CNI DEL Mar 14, 2024
@dougbtv
Copy link
Member

dougbtv commented Mar 14, 2024

The CNI spec reads that: https://github.com/containernetworking/cni/blob/main/SPEC.md#del-remove-container-from-network-or-un-apply-modifications

Plugins should generally complete a DEL action without error even if some resources are missing.

So while you have a point that sometimes there will be resources left behind, this is the suggested behavior for CNI plugins on a CNI DEL

@nayihz
Copy link
Author

nayihz commented Mar 15, 2024

If we don't use multus, pod will stuck after cmdDel failed. But we get completely inconsistent results when using multus. It's confusing for users.

@fzu-huang
Copy link

I think there should be a field in multus' CNI config to describe whether to tolerate errors returned by CNI Del. Giving control of the process to the user helps improve the usability of multus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants