ETCD backup script will delete other files when there is no space left on device #1625

24sama · 2022-11-22T05:39:27Z

What is version of KubeKey has the issue?

v3.0.1, v3.0.0, v2.3.0, v2.2.2, v2.2.1, v2.2.0, v2.1.1, v2.1.0, v2.0.0, v1.2.1, v1.2.0, v1.1.1, v1.1.0, v1.0.1

What is your os environment?

none

KubeKey config file

No response

A clear and concise description of what happend.

There is a very extreme case where the kk backup etcd script may erroneously delete / directory files when the node has no space to create directories (i.e not even 4096K).

Suggest using the latest version:

Binary downloads of the latest kk can be found on the Releases page.
Or
Download the latest kk by the following command

curl -sSL https://get-kk.kubesphere.io | sh -

And for the existing cluster installed by KubeKey command (kk), here is a solution.

manually editing the script:

$ vi /usr/local/bin/kube-scripts/etcd-backup.sh

modify the script like the below:
1. add set -o xxx at the beginning of the script
2. replace the ; after the cd command with && in the last line
Here is an example:

#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail

ETCDCTL_PATH='/usr/local/bin/etcdctl'
ENDPOINTS='https://192.168.100.3:2379'
ETCD_DATA_DIR="/var/lib/etcd"
BACKUP_DIR="/var/backups/kube_etcd/etcd-$(date +%Y-%m-%d-%H-%M-%S)"
KEEPBACKUPNUMBER='6'
ETCDBACKUPSCIPT='/usr/local/bin/kube-scripts'

ETCDCTL_CERT="/etc/ssl/etcd/ssl/admin-node1.pem"
ETCDCTL_KEY="/etc/ssl/etcd/ssl/admin-node1-key.pem"
ETCDCTL_CA_FILE="/etc/ssl/etcd/ssl/ca.pem"

[ ! -d $BACKUP_DIR ] && mkdir -p $BACKUP_DIR

export ETCDCTL_API=2;$ETCDCTL_PATH backup --data-dir $ETCD_DATA_DIR --backup-dir $BACKUP_DIR

sleep 3

{
export ETCDCTL_API=3;$ETCDCTL_PATH --endpoints="$ENDPOINTS" snapshot save $BACKUP_DIR/snapshot.db \
                                   --cacert="$ETCDCTL_CA_FILE" \
                                   --cert="$ETCDCTL_CERT" \
                                   --key="$ETCDCTL_KEY"
} > /dev/null 

sleep 3

cd $BACKUP_DIR/../ && ls -lt |awk '{if(NR > '$KEEPBACKUPNUMBER'){print "rm -rf "$9}}'|sh

reload the new script:

$ systemctl daemon-reload

Relevant log output

No response

Additional information

No response

The text was updated successfully, but these errors were encountered:

zjuwyz · 2023-04-17T11:56:04Z

We've unfortunately encounted with this bug. The root partition is mounted with option 'error=remount-ro', and accidently triggered it. So mkdir -p failed, cd failed, / is deleted.
Our data is shared with NAS and mounted under /. And they're all GONE.

etcdctl version is 3.4.13.

24sama added the bug Something isn't working label Nov 22, 2022

24sama pinned this issue Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD backup script will delete other files when there is no space left on device #1625

ETCD backup script will delete other files when there is no space left on device #1625

24sama commented Nov 22, 2022 •

edited by pixiake

zjuwyz commented Apr 17, 2023 •

edited

ETCD backup script will delete other files when there is no space left on device #1625

ETCD backup script will delete other files when there is no space left on device #1625

Comments

24sama commented Nov 22, 2022 • edited by pixiake

What is version of KubeKey has the issue?

What is your os environment?

KubeKey config file

A clear and concise description of what happend.

Relevant log output

Additional information

zjuwyz commented Apr 17, 2023 • edited

24sama commented Nov 22, 2022 •

edited by pixiake

zjuwyz commented Apr 17, 2023 •

edited