Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual machine on different hosts but in same isolated network not able to communicate (can't ping each other). #9027

Closed
akshaybachhav opened this issue May 2, 2024 · 18 comments

Comments

@akshaybachhav
Copy link

akshaybachhav commented May 2, 2024

ISSUE TYPE
  • Bug Report
  • Other
COMPONENT NAME
VR and instances
CLOUDSTACK VERSION
4.19.0.1
CONFIGURATION
   management server ip address : 10.0.1.10
   gateway : 10.0.1.1
   host machine 1 ip : 10.0.1.18
   host machine 2 ip : 10.0.1.12
   guest network ip range : 10.1.1.0/24
   virtual router public ip address : 10.0.1.32

   virtual machine on host machine 1(10.0.1.18) : 10.1.1.96
   virtual machine on host machine 2(10.0.1.12) : 10.1.1.158
  hypervisor type : KVM

image
image

OS / ENVIRONMENT
kvm setup on ubuntu 22.04
SUMMARY
We setup new cloudstack on ubuntu 22.04 with advanced networking. Added two hosts in the same zone, same pod and same cluster successfully. later created isolated network and using ubuntu iso template created two virtual machines one on each hosts.
later we are trying to ping from one vm to another, it says network unreachable. 

image

EXPECTED RESULTS
ubuntu-dev-vm-1 should be able to ping the virtual  machine >>>>> ubuntu-dev-vm-2
ACTUAL RESULTS
PING 10.1.1.158 (10.1.1.158) 56(84) bytes of data.
From 10.1.1.96 icmp_seq=1 Destination Host Unreachable

Screenshots

virtual router is running on the host 1 machine.
when we run diagnostics to ping from virtual router to the VM(10.1.1.158) the says network unreachable.
image

On one host machine I have created the vm and that also have virtual router, for this case the router is able to ping the same VM(10.1.1.96).
image

Copy link

boring-cyborg bot commented May 2, 2024

Thanks for opening your first issue here! Be sure to follow the issue template!

@rohityadavcloud
Copy link
Member

@akshaybachhav can you check iptables/firewall rules in the VM? Can you ping the VR from the VMs?
You can try tcpdump on icmp across VMs, and hosts to see what's failing/dropping the packets.

@weizhouapache
Copy link
Member

weizhouapache commented May 2, 2024

@akshaybachhav can you check iptables/firewall rules in the VM? Can you ping the VR from the VMs? You can try tcpdump on icmp across VMs, and hosts to see what's failing/dropping the packets.

+1
please also check the firewall rules on the kvm hosts.

@akshaybachhav
Copy link
Author

@akshaybachhav can you check iptables/firewall rules in the VM? Can you ping the VR from the VMs? You can try tcpdump on icmp across VMs, and hosts to see what's failing/dropping the packets.

Thanks for your help. We checked the iptables/firewall rules and found below status. It is allowing all traffic by default.
image

tcpdump command on vm which is on same host as on which VR is present gives below results:

04:54:50.697249 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7368836:7369064, ack 4249, win 1002, options [nop,nop,TS val 718803065 ecr 1773071291], length 228
04:54:50.697511 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7369064:7369468, ack 4249, win 1002, options [nop,nop,TS val 718803066 ecr 1773071291], length 404
04:54:50.697741 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7369468:7369696, ack 4249, win 1002, options [nop,nop,TS val 718803066 ecr 1773071291], length 228
04:54:50.697852 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7369064, win 7689, options [nop,nop,TS val 1773071291 ecr 718803065], length 0
04:54:50.698049 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7369696:7369732, ack 4249, win 1002, options [nop,nop,TS val 718803066 ecr 1773071291], length 36
04:54:50.698306 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7369696, win 7689, options [nop,nop,TS val 1773071292 ecr 718803066], length 0
04:54:50.698437 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7369732:7370512, ack 4249, win 1002, options [nop,nop,TS val 718803067 ecr 1773071292], length 780
04:54:50.698664 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7370512:7370724, ack 4249, win 1002, options [nop,nop,TS val 718803067 ecr 1773071292], length 212
04:54:50.698896 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7370724:7370952, ack 4249, win 1002, options [nop,nop,TS val 718803067 ecr 1773071292], length 228
04:54:50.699027 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7370512, win 7689, options [nop,nop,TS val 1773071293 ecr 718803066], length 0
04:54:50.699164 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7370952:7371180, ack 4249, win 1002, options [nop,nop,TS val 718803067 ecr 1773071293], length 228
04:54:50.699338 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7370952, win 7689, options [nop,nop,TS val 1773071293 ecr 718803067], length 0
04:54:50.699470 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7371180:7371216, ack 4249, win 1002, options [nop,nop,TS val 718803068 ecr 1773071293], length 36
04:54:50.699699 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7371216:7371612, ack 4249, win 1002, options [nop,nop,TS val 718803068 ecr 1773071293], length 396
04:54:50.699953 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7371216, win 7689, options [nop,nop,TS val 1773071294 ecr 718803067], length 0
04:54:50.700080 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7371612:7371840, ack 4249, win 1002, options [nop,nop,TS val 718803068 ecr 1773071294], length 228
04:54:50.700355 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7371840:7372052, ack 4249, win 1002, options [nop,nop,TS val 718803069 ecr 1773071294], length 212
04:54:50.700579 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7371840, win 7689, options [nop,nop,TS val 1773071294 ecr 718803068], length 0
04:54:50.700704 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7372052:7372456, ack 4249, win 1002, options [nop,nop,TS val 718803069 ecr 1773071294], length 404
04:54:50.700968 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7372456:7372684, ack 4249, win 1002, options [nop,nop,TS val 718803069 ecr 1773071294], length 228
04:54:50.701250 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7372456, win 7689, options [nop,nop,TS val 1773071295 ecr 718803069], length 0
04:54:50.701346 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7372684:7373280, ack 4249, win 1002, options [nop,nop,TS val 718803070 ecr 1773071295], length 596
04:54:50.701570 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7373280:7373508, ack 4249, win 1002, options [nop,nop,TS val 718803070 ecr 1773071295], length 228
04:54:50.701822 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7373508:7373712, ack 4249, win 1002, options [nop,nop,TS val 718803070 ecr 1773071295], length 204
04:54:50.701915 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7373280, win 7689, options [nop,nop,TS val 1773071296 ecr 718803069], length 0
04:54:50.702097 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7373712:7373940, ack 4249, win 1002, options [nop,nop,TS val 718803070 ecr 1773071296], length 228
04:54:50.702268 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7373712, win 7689, options [nop,nop,TS val 1773071296 ecr 718803070], length 0
04:54:50.702323 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7373940:7374152, ack 4249, win 1002, options [nop,nop,TS val 718803071 ecr 1773071296], length 212
04:54:50.702835 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7374152, win 7689, options [nop,nop,TS val 1773071296 ecr 718803070], length 0

I am able to ping the VR from the VM which is on same host. But if I ping from VM to VR which is not on same host then it is not working.

@weizhouapache
Copy link
Member

I am able to ping the VR from the VM which is on same host. But if I ping from VM to VR which is not on same host then it is not working.

@akshaybachhav
can you check the firewall rules on the kvm hosts ?

just to confirm, are both vms configured to get dhcp IPs from the VR ? or as static IPs ?

@akshaybachhav
Copy link
Author

I am able to ping the VR from the VM which is on same host. But if I ping from VM to VR which is not on same host then it is not working.

@akshaybachhav can you check the firewall rules on the kvm hosts ?

just to confirm, are both vms configured to get dhcp IPs from the VR ? or as static IPs ?

both vms get dhcp ips from vr no static ip set in vms

on both kvm machine status of ufw is
Status: inactive

@rohityadavcloud
Copy link
Member

Have you disabled apparmour on them, what guide or steps of installation if any did you follow @akshaybachhav
Make sure to stop/disable firewalld etc. It's possible you've wrong mtu or bridge or nic/network configuration if you can't ping VMs on the same VLAN.

@akshaybachhav
Copy link
Author

Have you disabled apparmour on them, what guide or steps of installation if any did you follow @akshaybachhav Make sure to stop/disable firewalld etc. It's possible you've wrong mtu or bridge or nic/network configuration if you can't ping VMs on the same VLAN.

i followed the tutorial from official documentation from apache cloudstack:
image

this is my netplan configuration file.
for host one and host 2
image

@akshaybachhav
Copy link
Author

Have you disabled apparmour on them, what guide or steps of installation if any did you follow @akshaybachhav Make sure to stop/disable firewalld etc. It's possible you've wrong mtu or bridge or nic/network configuration if you can't ping VMs on the same VLAN.

i followed the tutorial https://rohityadav.cloud/blog/cloudstack-kvm/ and done exactly same setup.

@weizhouapache
Copy link
Member

@akshaybachhav
can you create a new vm on the host (where the VR is not running on), and check

  • if the new VM gets dhcp ip
  • if the two VMs on the same host can ping each other

@akshaybachhav
Copy link
Author

akshaybachhav commented May 9, 2024

@akshaybachhav can you create a new vm on the host (where the VR is not running on), and check

  • if the new VM gets dhcp ip
    -->yes vm gets ip from dhcp.
  • if the two VMs on the same host can ping each other
    -->when vms on same host they can ping each other.

we have done setup of cloudstack using mini pc , asus chromebox and acer chromebox which has single nic.
image

@andrijapanicsb
Copy link
Contributor

If you are testing communication between 2 VMs on the SAME network, but on different KVM hosts, then you should check:

  • set static IP (any....) from the same subnet (any subnet) on both VMs (to remove DHCP as a variable)
  • ensure that ports on the switch to which the KVM hosts are connected, are in the TRUNK mode and allow passing all required VLANs (including the VLAN used by your isolated network) between all KVM hosts.

The problem you have sounds like (99.999%) to be an underlying infrastructure/configuration issue

@weizhouapache
Copy link
Member

  • vm gets ip from dhcp

It looks we forgot to ask you

  • what type of zone do you use ?
  • do you use vlan ?

@dineshjchoudhary
Copy link

@weizhouapache We are using
Core Zone Type
image

We haven't specified any VLAN/VNI while creating the zone.
image

@andrijapanicsb We are only able ssh the vm which is same host as of vr and the other vm which is on different host is not reachable. And I will check about TRUNK mode

@btzq
Copy link

btzq commented May 15, 2024

Hi @dineshjchoudhary @akshaybachhav

Did you guys managed to resolve this? We have Compute 6 Nodes and our 6th Node is having this same issue.

Using:

  • CS 4.19.0
  • Advanced Network
  • Disaggregated Setup
  • Linstor SDS Storage

We have checked:

  • Verify that App Armor is disabled
  • Verify that IP Tables in KVM Host is disabled
  • Verify that VLAN Trunking is enabled

But in our situation, what we noticed:

  • There are several VPCs created in our cloud
  • Only 1 Network in 1 VPC (A) whose VMs are located in Node 6 have this issue.
  • Strangely, other VMs belonging to other VPCs (B,C, etc) whose VMs are located in Node6 do not have this issue.
  • If we live migrate VMs from VPC(a) from Node 6 to any other hosts, the VM regains connectivity to the router. (Live migration works fine which is strange to me)

@btzq
Copy link

btzq commented May 15, 2024

Hi All,

We managed to resolve our issue.

Upon further checking, we found out that the VXLAN in Host 6 was down, while other VXLANs were up and running fine.

This would explain why:

  • Only VMs from certain networks were affected (cause cloudstack creates 1 VXLAN for each Network Tier)
  • LIve Migration between Node 6 and other hosts works fine (cause Live Migration uses the Management Physical Network instead of Guest Network)

@dineshjchoudhary @akshaybachhav , maybe you guys should try checking this out too if the root cause is the same. When we manually bring back the VXLAN, it works.

What we dont know now is, why the VXLAN suddenly went down. Ill probably raise another ticket for this.

@zap51
Copy link
Contributor

zap51 commented May 27, 2024

Hi All,

We managed to resolve our issue.

Upon further checking, we found out that the VXLAN in Host 6 was down, while other VXLANs were up and running fine.

This would explain why:

  • Only VMs from certain networks were affected (cause cloudstack creates 1 VXLAN for each Network Tier)
  • LIve Migration between Node 6 and other hosts works fine (cause Live Migration uses the Management Physical Network instead of Guest Network)

@dineshjchoudhary @akshaybachhav , maybe you guys should try checking this out too if the root cause is the same. When we manually bring back the VXLAN, it works.

What we dont know now is, why the VXLAN suddenly went down. Ill probably raise another ticket for this.

Interesting. When you say VXLAN interface, is it the physical interface (underlay) or the VTEP attached to the bridge?

@weizhouapache
Copy link
Member

Hi All,

We managed to resolve our issue.

Upon further checking, we found out that the VXLAN in Host 6 was down, while other VXLANs were up and running fine.

This would explain why:

  • Only VMs from certain networks were affected (cause cloudstack creates 1 VXLAN for each Network Tier)
  • LIve Migration between Node 6 and other hosts works fine (cause Live Migration uses the Management Physical Network instead of Guest Network)

@dineshjchoudhary @akshaybachhav , maybe you guys should try checking this out too if the root cause is the same. When we manually bring back the VXLAN, it works.

What we dont know now is, why the VXLAN suddenly went down. Ill probably raise another ticket for this.

Do you use multicast group?

If yes, can you check if the setting impacts you? https://docs.cloudstack.apache.org/projects/archived-cloudstack-getting-started/en/latest/networking/vxlan.html#important-note-on-max-number-of-multicast-groups-and-thus-vxlan-intefaces

@apache apache locked and limited conversation to collaborators May 31, 2024
@DaanHoogland DaanHoogland converted this issue into discussion #9154 May 31, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

7 participants