Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload cluster of Dell R230s with idrac8 stuck at boot prompt after pxe booting #1205

Open
fdawg4l opened this issue Oct 8, 2023 · 3 comments

Comments

@fdawg4l
Copy link

fdawg4l commented Oct 8, 2023

Hi,

Whenever I reboot a node which is part of a sidero workload cluster, it pxe boots and gets stuck at the boot prompt. I have to manually log into the iDRAC, connect to the console, and force it to boot from disk for the node to come back up.

It's possible I've configured something incorrectly, but

  • pxe booting works
  • I pxe booted these nodes to create/join a workload cluster
  • ipmi seems to work because after accepting the nodes and creating the cluster in the management cluster, they powered up and installed talos
  • and talosctl shutdown seems to do the right thing on these hosts.

I suspect this is a config error in whatever toggles the boot order via ipmi in the mangement cluster on the workload cluster.

Happy to provide logs, just let me know which are interesting.

Thanks!

tftp
stuck

@fdawg4l
Copy link
Author

fdawg4l commented Oct 8, 2023

BTW, Continuing will simply get us back into the pxe boot. I have to Continue, wait for the broadcom pxe firmware loading message, cancel that by pressing escape, and then the boot order continues and boots off of the primary disk into grub.

@smira
Copy link
Member

smira commented Oct 9, 2023

It's not clear what the problem is, but we recommend using snp.efi instead of ipxe.efi: https://www.sidero.dev/v0.6/getting-started/prereq-dhcp/

@mattiashem
Copy link

Hi
I also have some problem but maybe we can resolve both problems here.
I did get that problem before so if you update to the snp.efi you should get longer.

My dnsmasq config

      hostNetwork: true
      containers:
        - name: dnsmasq
          args:
            - -d
            - --port=5353
            - --dhcp-range=10.202.53.20,10.202.53.100
            - --dhcp-option=option:router,10.202.53.1
            - --dhcp-option=6,1.1.1.1
            - --dhcp-boot=tag:ipxe,ipxe.efi,10.202.53.11
            - --addn-hosts=/dnsmasq/hosts.text
            - --dhcp-hostsfile=/dnsmasq/dhcphosts.txt
            - --log-queries
            - --log-dhcp

Im running the DHCP proxy and it setup the boot.


023/11/14 13:53:55 HTTP GET /boot.ipxe 10.202.53.11:7788
2023-11-14T13:54:04Z	INFO	dhcp-proxy	offering boot response	{"source": "04:32:01:47:34:e0", "server": "10.202.53.11", "boot_filename": "snp.efi"}
2023-11-14T13:54:04Z	INFO	dhcp-proxy	ignoring packet	{"source": "04:32:01:47:34:e0", "reason": "packet is REQUEST, not DISCOVER"}
2023/11/14 13:54:05 HTTP GET /boot.ipxe 10.202.53.11:7788
2023/11/14 13:54:05 HTTP GET /boot.ipxe 10.202.53.11:47833
2023/11/14 13:54:08 HTTP GET /ipxe?uuid=4c4c4544-0038-4810-804e-c4c04f353034&mac=04-32-01-47-34-e0&domain=&hostname=node8&serial=D8HN504&arch=x86_64 10.202.53.58:48382
2023/11/14 13:54:08 Using "agent-amd64" environment
2023/11/14 13:54:08 HTTP GET /env/agent-amd64/vmlinuz 10.202.53.58:48382
2023/11/14 13:54:09 HTTP GET /env/agent-amd64/initramfs.xz 10.202.53.58:48382
2023/11/14 13:54:15 HTTP GET /boot.ipxe 10.202.53.11:47833

From the logs we can see that the proxy switches the bootfile to the new snp.efi

Still my boot get stuck in a
efi stub mesuredata into pcr 9
Im using a 10G card next time in the datacenter I will try the 1G card.

What in the world can "efi stub mesuredata into pcr 9" be ?

(Om booting from my VM and they use undionly.kpxe and are working good)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants