-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Allow writes during a node outage #3357
Comments
@Zorlin I don't know if you have any practical experience, but the problem should not exist. If one out of five nodes is faulty, writing can still continue without affecting users. The data will be appended to a new node, and modifications will be made to the remaining two replicas, which have a leader, so everything is also okay. |
Hi, agreed it shouldn't be a problem, but in my cluster with 5 datanodes and 5 metanodes, when I am writing to the filesystem and I take a node offline, everything "goes haywire" and I cannot write again until I put that node back online. |
How many data partitions does your volume have? |
Contact Details
No response
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
I would like to have CubeFS survive taking datanodes and metanodes offline.
Right now if I stop a datanode or have a physical failure, my cluster can no longer receive writes even on a 3-replica volume with 5 available datanodes.
Read workloads continue to work fine, but this stops me from doing certain kinds of work during the read-only mode.
Describe the solution you'd like.
We should implement a file/chunk versioning system combined with CRDTs (https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) to allow us to continue writing data when a DP shard is offline, and then update that shard when it returns.
Describe an alternate solution.
No response
Anything else? (Additional Context)
Other filesystems such as Ceph, MooseFS do things like this to allow for writing even during major node outages.
The text was updated successfully, but these errors were encountered: