Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Allow writes during a node outage #3357

Open
1 task done
Zorlin opened this issue Apr 26, 2024 · 3 comments
Open
1 task done

[Feature]: Allow writes during a node outage #3357

Zorlin opened this issue Apr 26, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@Zorlin
Copy link

Zorlin commented Apr 26, 2024

Contact Details

No response

Is there an existing issue for this?

  • I have searched all the existing issues

Is your feature request related to a problem? Please describe.

I would like to have CubeFS survive taking datanodes and metanodes offline.

Right now if I stop a datanode or have a physical failure, my cluster can no longer receive writes even on a 3-replica volume with 5 available datanodes.

Read workloads continue to work fine, but this stops me from doing certain kinds of work during the read-only mode.

Describe the solution you'd like.

We should implement a file/chunk versioning system combined with CRDTs (https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) to allow us to continue writing data when a DP shard is offline, and then update that shard when it returns.

Describe an alternate solution.

No response

Anything else? (Additional Context)

Other filesystems such as Ceph, MooseFS do things like this to allow for writing even during major node outages.

@Zorlin Zorlin added the enhancement New feature or request label Apr 26, 2024
@leonrayang
Copy link
Member

@Zorlin I don't know if you have any practical experience, but the problem should not exist. If one out of five nodes is faulty, writing can still continue without affecting users. The data will be appended to a new node, and modifications will be made to the remaining two replicas, which have a leader, so everything is also okay.

@Zorlin
Copy link
Author

Zorlin commented Apr 29, 2024

@Zorlin I don't know if you have any practical experience, but the problem should not exist. If one out of five nodes is faulty, writing can still continue without affecting users. The data will be appended to a new node, and modifications will be made to the remaining two replicas, which have a leader, so everything is also okay.

Hi, agreed it shouldn't be a problem, but in my cluster with 5 datanodes and 5 metanodes, when I am writing to the filesystem and I take a node offline, everything "goes haywire" and I cannot write again until I put that node back online.

@NaturalSelect
Copy link
Collaborator

@Zorlin I don't know if you have any practical experience, but the problem should not exist. If one out of five nodes is faulty, writing can still continue without affecting users. The data will be appended to a new node, and modifications will be made to the remaining two replicas, which have a leader, so everything is also okay.

Hi, agreed it shouldn't be a problem, but in my cluster with 5 datanodes and 5 metanodes, when I am writing to the filesystem and I take a node offline, everything "goes haywire" and I cannot write again until I put that node back online.

How many data partitions does your volume have?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants