bucc vm does not survive a restart #214

damzog · 2020-08-14T11:11:17Z

Hi,

we still use 0.92 on openstack. I observed that after a restart of the bucc vm e.g. be bucc ssh -> shutdown -r now the vm is not coming up again: It is rebooted but no monit process running, the persistent disk seems not to be mounted properly, see below. Any ideas? Is it a stemcell problem?

bosh/0:/var/vcap/bosh/bin# monit summary
/var/vcap/monit/job/0024_nats.monitrc:3: Warning: the executable does not exist '/var/vcap/jobs/bpm/bin/bpm'
/var/vcap/monit/job/0024_nats.monitrc:4: Warning: the executable does not exist '/var/vcap/jobs/bpm/bin/bpm'
/var/vcap/monit/job/0023_postgres.monitrc:3: Warning: the executable does not exist '/var/vcap/jobs/postgres/bin/postgres_ctl'
/var/vcap/monit/job/0023_postgres.monitrc:5: Warning: the executable does not exist '/var/vcap/jobs/postgres/bin/postgres_ctl'
[...]
/bosh_dns_resolvconf_ctl'
/var/vcap/monit/job/0001_director-bosh-dns.monitrc:3: Warning: the executable does not exist '/var/vcap/jobs/bpm/bin/bpm'
/var/vcap/monit/job/0001_director-bosh-dns.monitrc:4: Warning: the executable does not exist '/var/vcap/jobs/bpm/bin/bpm'
monit: no status available -- the monit daemon is not running

The text was updated successfully, but these errors were encountered:

ramonskie · 2020-08-14T11:44:19Z

i only know of this problem in combination with virtualbox cpi
it could have gone wrong because of several reasons.
and is this a one time occurrence or everytime?

if the disk still exists thats in the state file ./state/state.json
than you should be able to just do a bucc up.

damzog · 2020-08-14T15:54:56Z

Yes it is reproducible. A bucc up doesn't help because it detects no change and will not act. bucc up --recreate will recreate the vm and everything works fine again.

owwweiha · 2020-09-24T14:37:47Z

This issue also occurs on vSphere. On reboot, /var/vcap/store and /var/vcap/data are not mounted. Workaround: execute bucc up with the --recreate flag.

chewfred · 2021-01-19T17:42:00Z

I think we have the same problem with bucc up --lite --cpi=docker-desktop. When i restart the bosh instance in docker.. all the https request does not work

ramonskie · 2021-01-20T09:29:41Z

this is a cpi issues unfortunately. nothing much we can do about it from a bucc perspective.
we can try to fix it in the cpi and make a pr there.
if anyone is up for that?

owwweiha · 2021-05-31T11:04:12Z

@ramonskie can you explain this in more detail? If I got you correctly, this issue occurs in at least the openstack, docker, vsphere and virtualbox cpi.

ramonskie · 2021-05-31T11:55:08Z

i have not seen this issue occurring in vsphere
only on docker/virtualbox. and thats due to how the disks are mounted via those specific cpi's in combination with the bosh agent.

see this long standing open issue cloudfoundry/bosh-virtualbox-cpi-release#7
so in order to fix this. someone should fix those issues in the cpi/agent.
unfortunately we cannot ducktape a fix in bucc in this case.
the only thing we can do is either let the bosh team know and let them prioritize the work.
or fix it and make a pr to bosh

owwweiha · 2021-05-31T12:02:46Z

Well, we are facing this issue with the vSphere CPI and @damzog, who opened this issue, uses the openstack cpi. That's why I'm asking. For me it sounds like it's not only a bug with the docker/virtualbox CPI but with some other component. :(

ramonskie · 2021-05-31T12:26:33Z

is it reproducible?
have you already done some preliminary work of debugging this issue?
as we are testing it on vsphere with full upgrade scenarios etc and have not seen these kind of errors yet

owwweiha · 2021-05-31T13:26:59Z

Yes, I can reproduce this behaviour, just did it. We noticed this issue while performing some failover tests (e.g., vSphere HA moving and restarting the VM) but it's also reproducable by simply rebooting the bucc VM via vSphere GUI or by using govc vm.power -r=true
As far as we know, updating bucc is not affected by this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bucc vm does not survive a restart #214

bucc vm does not survive a restart #214

damzog commented Aug 14, 2020

ramonskie commented Aug 14, 2020

damzog commented Aug 14, 2020

owwweiha commented Sep 24, 2020

chewfred commented Jan 19, 2021

ramonskie commented Jan 20, 2021

owwweiha commented May 31, 2021

ramonskie commented May 31, 2021

owwweiha commented May 31, 2021

ramonskie commented May 31, 2021

owwweiha commented May 31, 2021

bucc vm does not survive a restart #214

bucc vm does not survive a restart #214

Comments

damzog commented Aug 14, 2020

ramonskie commented Aug 14, 2020

damzog commented Aug 14, 2020

owwweiha commented Sep 24, 2020

chewfred commented Jan 19, 2021

ramonskie commented Jan 20, 2021

owwweiha commented May 31, 2021

ramonskie commented May 31, 2021

owwweiha commented May 31, 2021

ramonskie commented May 31, 2021

owwweiha commented May 31, 2021