Skip to content

Change Summary v1.11

Kiran Mova edited this page Jun 16, 2020 · 4 revisions

Status: Released on 15-June-2020


Important Announcement: OpenEBS Community Slack channels will be migrated to Kubernetes Slack Workspace by Jun 22nd

In the interest of neutral governance, the OpenEBS community support via slack is being migrated from openebs-community slack (a free version of slack with limited support for message retention) to the following OpenEBS channels on Kubernetes Slack owned by CNCF.

The #openebs-users channel will be marked as read-only by June 22nd.

More details about this migration can be found here.

Given that openebs-community slack has been a neutral home for many vendors that are offering free and commercial support/products on top of OpenEBS, the workspace will continue to live on. These vendors are requested to create their own public channels and the information about those channels can be communicated to users via the OpenEBS website by raising an issue/pr to https://github.com/openebs/website.


Table of contents

Security Vulnerabilities Fixed

None

Deprecated Releases or Features

Releases prior to 1.0 have been deprecated. The steps to upgrade are here.

Breaking Changes

  • Important Note on Jiva Volume upgrade: If you are upgrading Jiva volumes from version 1.6 and 1.7. You must use these pre-upgrade steps to check if your jiva volumes are impacted by openebs/openebs#2956. If they are, please reach out to us on Slack or create an GitHub issue for helping you with the upgrade.

  • Important Note for contributors: To help with hosting OpenEBS helm charts using GitHub pages, the contents of the openebs/charts repository have been refactored as follows:

    • The operator YAMLs, scripts and so forth have been moved to gh-pages branch.
    • The helm chart templates are moved to openebs/charts, preparing for the migration from helm/charts to openebs/charts in the upcoming release.
    • Instructions for installing from the openebs helm charts can be found here. http://openebs.github.io/charts/

New Capabilities

  • ZFS Local PV is graduated to beta after clearing all known adoption blockers and receiving public references from Optoro, Agones and more community users. ZFS Local PV goes into beta the following features:

    • Dynamic provisioning of Local PVs backed by ZFS datasets.
    • Enforced volume size limit
    • Thin provisioned
    • Access Modes of ReadWriteOnce
    • Volume modes - Filesystem and Block
    • Volume and ZPOOL metrics
    • Grafana Dashboard to monitory ZFS Volumes and Pools
    • Supports fsTypes: ZFS, ext4, xfs
    • Online expansion for fsTypes: ZFS, ext4, xfs
    • Snapshot and Clone
    • Validated by users running on CentOS and Ubuntu.

    For detailed instructions on how to get started with ZFS Local PV please refer to the Quick start guide.

Alpha Feature Updates

Alpha Features

  • [Mayastor] Active development is underway on OpenEBS Mayastor developed using NVMe based architecture, targetted at addressing performance requirements of IO-intensive workloads is ready for alpha testing. For detailed instructions on how to get started with Mayastor please refer to this Quickstart guide. Mayastor 0.2.0 comes with the following capabilities:

    • Rebuild Process - A rebuild mechanism has been added to Mayastor which, through the scheduling of multiple concurrent rebuild 'tasks', is able to bring a new child (i.e. replica) which is being added to an existing Nexus into a consistent state with its current children and into synchronization with the Nexus' front-end volume (i.e. Persistent Volume), thereby increasing/restoring workload protection levels. It does so without disrupting workload IO. (Known Limitation: Any IO error encountered either at the source or destination child of a rebuilding task will cause the rebuild process to halt and no attempt will be made to retry. This will be addressed in a future release).
    • NVMe-oF Support - A Nexus can be created and shared (i.e. a front-end target established) using the NVMe protocol. (Known Limitation: It is not currently possible to provision an NVMe-oF Nexus via CSI. This functionality will be introduced in a future release).
    • Mayastor Node (msn) CRD - Introduced for the persistence and observability of Mayastor-enabled cluster node configuration and membership
    • Mayastor-client - A gRPC-based diagnostic and low-level configuration tool for Mayastor is introduced as a planned successor to the existing mctl utility. It is expected that mctl will be deprecated in a future release.
  • [cStor CSI and Enhanced Operators] Active development is underway to provide CSI support for cStor. For current status and detailed instructions on how to get started with provisioning cStor with CSI Driver and Enhanced Operators, please refer to the Quick start guide. This project made the following updates since its last release:

    • Added support for Backup/Restore via Velero Plugin
    • Added support for migration from SPC to CSPC
    • Added support for the upgrade of CSPC pools and CSI volumes.
    • Add additional integration and unit tests.
  • [NDM] Continuing to make progress on the NDM feature added in OpenEBS v1.10.0 to help with discovering partitions and virtual drives that don't have any uniquely identifiable attributes like a serial number or wwn. openebs/node-disk-manager#386. This feature makes use of the partition/filesystem id if it exists or will create a partition on an empty block device and use that partition id for unique identification. This feature is not yet ready for production use cases and can be enabled via a feature gate flag. The feature gate can be enabled by adding an additional arg (--feature-gates=) to NDM DaemonSet or via Helm flag featureGates.GPTBasedUUID.featureGateFlag=GPTBasedUUID

  • [Support for ARM images] We have a lot of new users trying out the ARM images, providing feedback, and helping with adding new features. We need more support/help with running tests on different flavors of ARM. Please join our active contributor and user community on Slack

    • Fixed the openebs-operator-arm-dev.yaml to point to correct ARM images. charts#108 (@radicand)
    • Active development on support multi-arch images is in progress. Checkout node-disk-manager#428 (@xUnholy)
    • Fixed an issue with determining the value CPU_SEQID which was causing a crash on arm64. The fix helps with some optimization on amd64. cstor#309 (@sgielen)
    • Fixed an issue with starting cstor target containers on ARM due to a conflict in version of libjson between Travis build machine and base image used by the cstor-istgt container. [openebs#3037](https://gitub.com/openebs/openebs/issues(3037), cstor#310 (@mynktl, @xUnholy)
  • [Support for PowerPC images] We have NDM, Local PV Provisioner images ready to be tested. If you are interested in helping us take this forward, please join our active contributor and user community on Slack

    • Automate the generation of provisioner-localpv and linux-utils image for PowerPC via Travis jobs. maya#1704 linux-utils#4 (@Pensu)
  • [RawFile - Local PV] We are working on a new type of Dynamic Local PV called RawFile - Local PV, backed by a single file on host filesystem as the block device, using Linux’s loop. This is aimed at addressing some of the concerns around quota management and metrics with hostpath Local PVs. To learn more about this new PV and to help us develop this feature, please reach out to OpenEBS Contributor Community.

Enhancements

  • [Helm Charts] Enhanced helm charts to make NDM filterconfigs.state configurable. charts#107 (@fukuta-tatsuya-intec)
  • [NDM] Added configuration to exclude rbd devices from being used for creating Block Devices charts#111 (@GTB3NW)
  • [NDM] Added support to delete unused deprecated disk CRDs that might be in the cluster. node-disk-manager#427 (@akhilerm)
  • [NDM] Added support to display FSType information in Block Devices node-disk-manager#438 (@harshthakur9030)
  • [ZFS Local PV] Enhance the contributor docs and fix go lint warnings. zfs-localpv#138 zfs-localpv#133 (@Icedroid)
  • [ZFS Local PV] Set the API version for Custom Resources of ZFS Local PV to v1. (@pawanpraka1)
  • [ZFS Local PV] Add support to mount ZFS datasets using legacy mount property to allow for multiple mounts on a single node. zfs-localpv#151 (@pawanpraka1)
  • [Control Plane] Enhance the prometheus exporter to handle concurrent scrape requests maya#1698 (@utkarshmani1997)
  • [Release] There were several fixes and enhancements to the Travis build files to enable users to build OpenEBS container images from forked GitHub repos and push to custom Dockerhub registry. Please reach out to OpenEBS Contributor community for more details.
  • [Build] Upgrade the Go projects openebs/maya, openebs/zfs-localpv and openebs/node-disk-manager to use Go modules. - maya#1711 (@vaniisgh), node-disk-manager#434 (@harshthakur9030), zfs-localpv#148 (@prateekpandey14)

Major Bug Fixes

  • [ZFS Local PV] Fixes an issue where volumes meant to be filesystem datasets got created as zvols due to misspelled case for StorageClass parameter. The fix makes the StorageClass parameters case insensitive zfs-localpv#144 (@cruwe)
  • [ZFS Local PV] Fixes an issue where the read-only option was not being set of ZFS volumes. zfs-localpv#137 (@pawanpraka1)
  • [ZFS Local PV] Fixes an issue where incorrect pool name or other parameters in Storage Class would result in stale ZFS Volume CRs being created. zfs-localpv#121 zfs-localpv#145 (@pawanpraka1)
  • [Jiva] Fixes an issue where the user configured ENV variable for MAX_CHAIN_LENGTH was not being read by Jiva. jiva#309 (@payes)
  • [cStor] Fixes an issue where cStor Pool was being deleted forcefully before the replicas on cStor Pool were deleted. This can cause data loss in situations where SPCs are incorrectly edited by the user, and a cStor Pool deletion is attempted. maya#1710 (@mittachaitu)
  • [cStor] Fixes an issue where a failure to delete the cStor Pool on the first attempt will leave an orphaned cStor custom resource (CSP) in the cluster. maya#1595 (@mittachaitu)
  • [Docs] Fix the ubuntu iSCSI prerequisites steps with usage of systemctl --now for iscsid service openebs-docs#809 (@mtmn)
  • [Docs] Fixes the instructions for setting up cStor pools with new schema. openebs-docs#817 (@ShubhamB99)

E2E Testing Updates

  • [New Test - ZFS Local PV] Validate raw-block-volume support e2e-tests#350 (@w3aman)
  • [New Test - ZFS Local PV] Validate custom-topology support e2e-tests$364 (@w3aman)
  • [New Test - ZFS Local PV] Validate ZFS PV provisioning by introducing chaos -docker/kubelet restarts. e2e-tests$368 (@w3aman)
  • [New Test - cStor CSPC] Validate cStor Provisioning by introducing chaos - restart cstor-pool-mgmt e2e-tests$370 (@nsathyaseelan)
  • [New Testing - Backup/Restore cStor] Validate restore or remote backup in different namespace. e2e-tests$372(@shashank855)
  • [New Testing - Backup/Restore cStor] Validate Backup/Restore using non-default S3 profile. e2e-tests$366 (@shashank855)

E2E Platform Updates

Automated end-to-end tests are executed using the GitLab runner. The following platforms were verified.

  • Kubernetes 1.15.2, 1.16.1, 1.17.2
  • OpenShift 4.2
  • Konvoy V1.2.3

Backward Incompatibilities

This release removes the support for managing volumes that are in OpenEBS 0.9.x version or earlier. Please upgrade your volumes in versions 0.9.0 or earlier to 1.0, before upgrading to this release

For releases prior to 1.0, please refer to the respective release notes and upgrade steps.

  • From 1.0.0: None.

  • From 1.2: If the status of CSP is either in Init or PoolCreationFailed, then cstor-pool-mgmt container in pool pod will attempt to create the pool. So, when there is a need to re-create pool for the same CSP due to ephemeral storage, CSP CR related to this pool needs to be edited to set status.phase as Init. As part of reconciliation, cstor-pool-mgmt container of pool pod will attempt to recreate the pool. Refer openebs/maya#1401 for more details.

  • From 1.3.0: None

  • From 1.4.0: None

  • From 1.5.0: Dumping cores have been disabled by default for cStor pool and NDM DaemonSet pods. This can be enabled by setting an ENV variable ENABLE_COREDUMP to 1. The ENV setting can be added in cStor pool deployment for dumping core for cStor pool pod and the ENV setting can be added in NDM DaemonSet spec for dumping core for NDM DaemonSet pods.

  • From 1.6.0, 1.7.0: OpenEBS v1.8 includes a critical fix (#2956) for Jiva volumes that are running in version 1.6 and 1.7. You must use these pre-upgrade steps to check if your jiva volumes are impacted. If they are, please reach out to us on OpenEBS Community for helping you with the upgrade.

  • From 1.8.0, 1.9.0, 1.10.0: None

Important Notes: If you are on OpenEBS v1.8 or an earlier release, please be aware of the following changes when upgrading to 1.10.

  • The way Jiva volume-related Kubernetes objects (Jiva Target and Controller Deployment, Jiva Target Service) are deployed. These changes are required to tighten RBAC policies for OpenEBS Service Account and to better support Jiva volumes in production. For more details refer to the Jiva Data Engine Enhancements.

  • OpenEBS Velero plugin container image tag format has changed to specify only OpenEBS release version. For instance:

    • OpenEBS Velero plugin v1.10 will be available at openebs/velero-plugin:1.10.0
    • OpenEBS Velero plugin earlier versions like v1.8 included the minimum velero version required like openebs/velero-plugin:1.8.0-velero_1.0.0.

Also, as part of OpenEBS upgrade or installation, maya-apiserver pod will restart if NDM BlockDevice CRDs are not created before the creation of maya-apiserver. https://github.com/openebs/maya/pull/1381

Install

Prerequisite to install

  • Kubernetes 1.14+ is installed. If using Kubernetes 1.16+ then please use OpenEBS 1.8 and above.
  • Make sure that you run the below installation steps with the cluster-admin context. The installation will involve creating a new Service Account and assigning it to OpenEBS components.
  • Make sure that iSCSI Initiator is installed on all the Kubernetes nodes.
  • Node-Disk-Manager (NDM) helps in discovering the devices attached to Kubernetes nodes, which can be used to create storage pools. If you like to exclude some of the disks from getting discovered, update the filters in NDM Config Map to exclude paths before installing OpenEBS.
  • NDM runs as a privileged pod since it needs to access the device information. Please make the necessary changes to grant access to run in privileged mode. For example, when running in RHEL/CentOS, you may need to set the security context appropriately. Refer Configuring OpenEBS with selinux=on

Install using kubectl

kubectl apply -f https://openebs.github.io/charts/1.11.0/openebs-operator.yaml

Install using helm stable charts

helm repo update
helm install --namespace openebs --name openebs stable/openebs --version 1.11.0

For more details refer to the documentation at https://docs.openebs.io/

Upgrade

The steps to upgrade are here. Kubernetes Job-based upgrades to version 1.11 are supported only from 1.0 or higher and follow a similar process as earlier releases.

  • Upgrade OpenEBS Control Plane components.
  • Upgrade Jiva PVs to 1.11, either one at a time or multiple volumes. Note that it is recommended to scale down the applications using Jiva volumes prior to upgrading.
  • Upgrade CStor Pools to 1.11 and its associated Volumes, one at a time or multiple volumes.

For upgrading from releases prior to 1.0, please refer to the respective release upgrade here.

Uninstall

The recommended steps to uninstall are:

  • delete all the OpenEBS PVCs that were created
  • delete all the SPCs (in case of cStor)
  • ensure that no volume or pool pods are pending in terminating state kubectl get pods -n <openebs namespace>
  • ensure that no openebs cStor volume custom resources are present kubectl get cvr -n <openebs namespace>
  • delete all openebs related StorageClasses.
  • delete the openebs either via helm purge or kubectl delete

Uninstalling OpenEBS doesn't automatically delete the CRDs that were created. If you would like to remove CRDs and the associated objects completely, run the following commands:

```
kubectl delete crd castemplates.openebs.io
kubectl delete crd cstorpools.openebs.io
kubectl delete crd cstorvolumereplicas.openebs.io
kubectl delete crd cstorvolumes.openebs.io
kubectl delete crd runtasks.openebs.io
kubectl delete crd upgradetasks.openebs.io
kubectl delete crd storagepoolclaims.openebs.io
kubectl delete crd storagepools.openebs.io
kubectl delete crd volumesnapshotdatas.volumesnapshot.external-storage.k8s.io
kubectl delete crd volumesnapshots.volumesnapshot.external-storage.k8s.io
kubectl delete crd disks.openebs.io
kubectl delete crd blockdevices.openebs.io
kubectl delete crd blockdeviceclaims.openebs.io
kubectl delete crd cstorbackups.openebs.io
kubectl delete crd cstorrestores.openebs.io
kubectl delete crd cstorcompletedbackups.openebs.io
kubectl delete crd cstorvolumeclaims.openebs.io
kubectl delete crd cstorpoolclusters.openebs.io
kubectl delete crd cstorpoolinstances.openebs.io
```

Note: As part of deleting the Jiva Volumes - OpenEBS launches scrub jobs for clearing data from the nodes. The completed jobs need to be cleared using the following command: kubectl delete jobs -l openebs.io/cas-type=jiva -n <namespace>

Limitations / Known Issues

For a more comprehensive list of open issues uncovered through e2e and community testing, please refer to GitHub open issues. Here is a quick summary of common known issues.

  • The current version of OpenEBS cStor is not optimized for performance-sensitive applications.

  • Upgrade of alpha features like cStor Pools using new schema (CSPC), cStor CSI Driver, Jiva CSI Driver, MayaStor, and ZFS Local PV are not supported.

  • A healthy volume(cStor or Jiva) which has a slow replica due to disk slowness or has huge network latency can cause volume read-only. In this case, a write IO to that slow replica takes more than 15-30 seconds to communicate with its controller which might cause disconnection from the initiator.

  • Overprovisioning is enabled by default on cStor volumes. In case of application such as ElasticSearch, Postgresql which uses ext4 filesystem without unmap support and when a data is written and modified, ext4 tends to use new blocks of storage without sending any delete (unmap) requests back to the underlying block storage - like cStor. This can be avoided by setting a configuration on storage class based on resource quota to set a limit on the sum of capacity allocated to all the PVCs to be within the available capacity of the underlying cStor Pools. Further details are tracked here.

  • cStor volume will be offline when ReplicationFactor is not greater than 50% and then cStor volume will not come online automatically and then reconstructing data to recovered replicas. It requires manual steps to make the volume online and reconstruct the data to the replaced replicas from a healthy replica once the cStor volume is online.

  • If a pending PVC related to openebs-device StorageClass is deleted, there are chances of getting stale BDCs which ends up consuming BDs. You have to manually delete the BDC to reclaim it.

  • In OpenShift 3.10 or above, NDM daemon set pods and NDM operators will not be upgraded if the NDM daemon set's DESIRED count is not equal to the CURRENT count. This may happen if nodeSelectors have been used to deploy OpenEBS related pods OR if master/other nodes have been tainted in the k8s cluster.

  • Jiva Controller and Replica pods are stuck in the Terminating state when any instability with the node or network happens and the only way to remove those containers is by using docker rm -f on the node. https://github.com/openebs/openebs/issues/2675

  • cStor Target or Pool pods can at times be stuck in a Terminating state. They will need to be manually cleaned up using kubectl delete with a 0 sec grace period. Example: kubectl delete deploy -n openebs --force --grace-period=0

  • cStor pool pods can consume more memory when there is continuous load. This can cross the memory limit and cause pod evictions. It is recommended that you create cStor pools by setting the Memory limits and requests.

  • Jiva Volumes are not recommended if your use case requires snapshots and clone capabilities.

  • Jiva Replicas use a sparse file to store the data. When the application causes too many fragments (extents) to be created on the sparse file, the replica restart can cause replica to take a longer time to get attached to the target. This issue was seen when there were 31K fragments created.

  • Jiva Replicas use a sparse file to store the data. On every restart of the controller or the replica, a new sparse file is created per replica. Long-running Jiva volumes can end up consuming more disk space as the number of sparse files per replica starts increasing.

  • Volume Snapshots are dependent on the functionality provided by Kubernetes. The support is currently alpha. The only operations supported are:

    • Create Snapshot, Delete Snapshot, and Clone from a Snapshot. The creation of the Snapshot uses a reconciliation loop, which would mean that a Create Snapshot operation will be retried on failure until the Snapshot has been successfully created. This may not be a desirable option in cases where Point in Time snapshots are expected.
  • If you are using the K8s version earlier than 1.12, in certain cases, it will be observed that when the node hosting the target pod is offline, the target pod can take more than 120 seconds to get rescheduled. This is because target pods are configured with Tolerations based on the Node Condition, and TaintNodesByCondition is available only from K8s 1.12. If running an earlier version, you may have to enable the alpha gate for TaintNodesByCondition. If there is an active load on the volume when the target pod goes offline, the volume will be marked as read-only.

  • If you are using K8s version 1.13 or later, that includes the checks on ephemeral storage limits on the Pods, there is a chance that OpenEBS cStor and Jiva pods can get evicted - because there are no ephemeral requests specified. To avoid this issue, you can specify the ephemeral storage requests in the storage class or storage pool claim. (https://github.com/openebs/openebs/issues/2294)

  • When the disks used by a cStor Pool are detached and reattached, the cStor Pool may miss detecting this event in certain scenarios. Manual intervention may be required to bring the cStor Pool online. (https://github.com/openebs/openebs/issues/2363)

  • When the underlying disks used by cStor or Jiva volumes are under disk pressure due to heavy IO load, and if the Replicas take longer than 60 seconds to process the IO, the Volumes will get into Read-Only state. In 0.8.1, logs have been added to the cStor and Jiva replicas to indicate if IO has longer latency. (https://github.com/openebs/openebs/issues/2337)

  • LocalPV RawBlock volumes are not supported when the application container is running in privileged mode.

  • An application may go into the read-only state during the Jiva upgrade if multipath support is enabled on the Nodes. It requires manual login of iscsi on the node where the application pod is running or scheduling of the application onto another node to make the application into running state.

Support