Workload cluster of Dell R230s with idrac8 stuck at boot prompt after pxe booting #1205

fdawg4l · 2023-10-08T03:11:26Z

Hi,

Whenever I reboot a node which is part of a sidero workload cluster, it pxe boots and gets stuck at the boot prompt. I have to manually log into the iDRAC, connect to the console, and force it to boot from disk for the node to come back up.

It's possible I've configured something incorrectly, but

pxe booting works
I pxe booted these nodes to create/join a workload cluster
ipmi seems to work because after accepting the nodes and creating the cluster in the management cluster, they powered up and installed talos
and talosctl shutdown seems to do the right thing on these hosts.

I suspect this is a config error in whatever toggles the boot order via ipmi in the mangement cluster on the workload cluster.

Happy to provide logs, just let me know which are interesting.

Thanks!

fdawg4l · 2023-10-08T03:12:50Z

BTW, Continuing will simply get us back into the pxe boot. I have to Continue, wait for the broadcom pxe firmware loading message, cancel that by pressing escape, and then the boot order continues and boots off of the primary disk into grub.

smira · 2023-10-09T10:07:25Z

It's not clear what the problem is, but we recommend using snp.efi instead of ipxe.efi: https://www.sidero.dev/v0.6/getting-started/prereq-dhcp/

mattiashem · 2023-11-14T14:02:57Z

Hi
I also have some problem but maybe we can resolve both problems here.
I did get that problem before so if you update to the snp.efi you should get longer.

My dnsmasq config

      hostNetwork: true
      containers:
        - name: dnsmasq
          args:
            - -d
            - --port=5353
            - --dhcp-range=10.202.53.20,10.202.53.100
            - --dhcp-option=option:router,10.202.53.1
            - --dhcp-option=6,1.1.1.1
            - --dhcp-boot=tag:ipxe,ipxe.efi,10.202.53.11
            - --addn-hosts=/dnsmasq/hosts.text
            - --dhcp-hostsfile=/dnsmasq/dhcphosts.txt
            - --log-queries
            - --log-dhcp

Im running the DHCP proxy and it setup the boot.


023/11/14 13:53:55 HTTP GET /boot.ipxe 10.202.53.11:7788
2023-11-14T13:54:04Z	INFO	dhcp-proxy	offering boot response	{"source": "04:32:01:47:34:e0", "server": "10.202.53.11", "boot_filename": "snp.efi"}
2023-11-14T13:54:04Z	INFO	dhcp-proxy	ignoring packet	{"source": "04:32:01:47:34:e0", "reason": "packet is REQUEST, not DISCOVER"}
2023/11/14 13:54:05 HTTP GET /boot.ipxe 10.202.53.11:7788
2023/11/14 13:54:05 HTTP GET /boot.ipxe 10.202.53.11:47833
2023/11/14 13:54:08 HTTP GET /ipxe?uuid=4c4c4544-0038-4810-804e-c4c04f353034&mac=04-32-01-47-34-e0&domain=&hostname=node8&serial=D8HN504&arch=x86_64 10.202.53.58:48382
2023/11/14 13:54:08 Using "agent-amd64" environment
2023/11/14 13:54:08 HTTP GET /env/agent-amd64/vmlinuz 10.202.53.58:48382
2023/11/14 13:54:09 HTTP GET /env/agent-amd64/initramfs.xz 10.202.53.58:48382
2023/11/14 13:54:15 HTTP GET /boot.ipxe 10.202.53.11:47833

From the logs we can see that the proxy switches the bootfile to the new snp.efi

Still my boot get stuck in a
efi stub mesuredata into pcr 9
Im using a 10G card next time in the datacenter I will try the 1G card.

What in the world can "efi stub mesuredata into pcr 9" be ?

(Om booting from my VM and they use undionly.kpxe and are working good)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workload cluster of Dell R230s with idrac8 stuck at boot prompt after pxe booting #1205

Workload cluster of Dell R230s with idrac8 stuck at boot prompt after pxe booting #1205

fdawg4l commented Oct 8, 2023

fdawg4l commented Oct 8, 2023

smira commented Oct 9, 2023

mattiashem commented Nov 14, 2023

Workload cluster of Dell R230s with idrac8 stuck at boot prompt after pxe booting #1205

Workload cluster of Dell R230s with idrac8 stuck at boot prompt after pxe booting #1205

Comments

fdawg4l commented Oct 8, 2023

fdawg4l commented Oct 8, 2023

smira commented Oct 9, 2023

mattiashem commented Nov 14, 2023