Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UPG FAIL] Linode VM - Prompt if detected, recommend disabling Lassie (shutdown watchdog) #274

Open
lsthompson opened this issue May 25, 2023 · 7 comments
Assignees

Comments

@lsthompson
Copy link

Hi there,

I appreciate this isn't a direct flaw in the elevate script, however a warning offering bail-out may be wise to those running a Linode who are seeking to elevate their machine. Linode by default enables Lassie, a reboot watchdog, for all machines.

We went ahead with clearing blockers and proceeded, however Lassie then quite effectively performed several hard reboots during critical phases of the upgrade process. This resulted in an inoperable boot environment, and a server rebuild.

Might be wise to add a warning to the elevate script, allowing Linode users to check for and disable Lassie firstly.

https://www.linode.com/docs/products/compute/compute-instances/guides/lassie-shutdown-watchdog/

Just a thought,
Luke

@troglodyne
Copy link
Contributor

That certainly might explain why my latest testing efforts on Linode in fact failed.

Will do some testing myself with lassie disabled tonight or tomorrow morning, see if that actually fixes things. If so, a PR will be forthcoming quickly.

@troglodyne troglodyne self-assigned this May 25, 2023
@lsthompson lsthompson changed the title [UPG FAIL] Linode VM - Prompt if detected, recommend disabling Lassie (reboot watchdog) [UPG FAIL] Linode VM - Prompt if detected, recommend disabling Lassie (shutdown watchdog) May 25, 2023
@troglodyne
Copy link
Contributor

Hmm. As far as I can tell this doesn't really seem to matter. Either way you wind up in a grub shell via lish to have to try to boot into something, as the upgrade fails. All you really do is cause yourself a pain in the ass due to having to manually boot the VM every time it wants to reboot, as that's just the way linode is -- any reboot is just a shutdown unless lassie is enabled.

Anyways, what I get is the following in grub shell:

error: file `/boot/grub/i386-pc/increment.mod' not found.
error: file `/boot/grub/i386-pc/blscfg.mod' not found.
error: can't find command `blscfg'.
error: file `/boot/grub/grubenv' not found.
error: file `/boot/grub/i386-pc/increment.mod' not found.
error: file `/boot/grub/i386-pc/blscfg.mod' not found.
error: can't find command `blscfg'.
error: file `/boot/grub/grubenv' not found.

This of course, makes sense, as:

grub> ls (hd0)/boot/grub
grub.cfg

Nobody home. Basically have to boot manually since grub config is tango uniform.

set root=(hd0,1)
linux /boot/vmlinuz-4.18.0-477.10.1.el8_8.x86_64 root=/dev/sda1
initrd /boot/initramfs-4.18.0-477.10.1.el8_8.x86_64.img
boot

...will at least boot the machine at that point, though dracut is not happy and considers it an emergency if you check glish:
Screenshot at 2023-05-25 11-01-43

Will keep investigating, at the least to see if there's some way to give users a good way to work around things when it just absolutely explodes like this.

@troglodyne
Copy link
Contributor

Yep. Workaround boot is

set root=(hd0)
linux /boot/vmlinuz-4.18.0-477.10.1.el8_8.x86_64 root=/dev/disk/by-label/linode-root
initrd /boot/initramfs-4.18.0-477.10.1.el8_8.x86_64.img
boot

Presumably would need to just keep doing that till it is done then repair grub afterwards, we shall see.
Definitely looking like some things we might be able to work around here to avoid it

@troglodyne
Copy link
Contributor

So after stage 5 we get that as expected, though we enter a different failure mode via dracut

Failed to switch root: Specified root path '/sysroot' does not seem to be an OS tree.

I can certainly see why the normal reaction to this would just be hitting eject. Presumably here it just needs to be mounting the disk properly, as that certainly is possible within rescue terminal. Continuing...

@troglodyne
Copy link
Contributor

troglodyne commented May 25, 2023

So eventually we get a workable system that stops rebooting and reports great success. After that just a matter of repair, judging by a forums thread here on this specific issue (lol):
https://almalinux.discourse.group/t/how-to-repair-rebuild-grub-following-a-cross-upgrade-from-centos-7-to-almalinux-8/1268

Presumably the "suggested workaround" there will be key to whatever approach we take for avoiding the problem/give a more appropriate blocker message.

@troglodyne
Copy link
Contributor

So, after investigating previous work we did around the blocker for GRUB_ENABLE_BLSCFG, I have concluded that the blocker is entirely unnecessary, but for reasons of "CentOS 7 doesn't install this value to the default grub config anyways". We still have blocker code there, but it will never fire. You instead get splashdown on upgrade due to the new config shipped with almalinux 8 setting this as a default. This instead must be addressed before the relevant reboot instead of failing to block this ahead of time.

@troglodyne
Copy link
Contributor

First attempt at post-leapp fix has failed. Possibly due to this executing later than is needed. Need to ensure this happens while we are booted into single user mode. Post leapp run but before reboot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants