Voluntarism in the capture range of keys in case of a cluster separation can lead to data loss. #59

GoogleCodeExporter · 2015-04-08T12:26:34Z

What steps will reproduce the problem?
1. Start scalaris with four nodes. An each node should to own an equal part oh 
the keyspace.
2. Suspend all nodes except the boot-node by pressing Ctrl-C in the erlang 
shell of these nodes.
3. Make the write operation for some key:
ok=cs_api_v2:write("Key", 1).
1=cs_api_v2:read("Key").
4. Resume the suspended nodes (by pressing c + [enter] on each).
5. Try to read the key value:
{fail, not_found}=cs_api_v2:read("Key").

What is the expected output? What do you see instead?
So we lost the data after the cluster recombination. You can get another 
effect, in the case when the writing of this key was made for each of the 
breakaway node by different clients. In this case, after recombination, the 
nodes can be stored keys having different values but the same version. And 
therefore in the further reading data from the key, different clients can get 
different values simultaneously.
The proof:
> cs_api_v2:range_read(0,0).
{ok,[{12561922216592930516936087995162401722,2,false,0,0},
     {182703105677062162248623391711046507450,4,false,0,0},
     {267773697407296778114467043568988560314,1,false,0,0},
     {97632513946827546382779739853104454586,3,false,0,0}]}
It is four different values for the "Key" key.

What version of the product are you using? On what operating system?
r978

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 10 Aug 2010 at 3:31

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-04-08T12:26:34Z

Hi!
I apologize for persistence. Can you to comment this issue (and issue 57)?

Original comment by [email protected] on 25 Aug 2010 at 1:04

GoogleCodeExporter · 2015-04-08T12:26:35Z

With your test you violate a precondition of Scalaris: Only a minority of 
replicas is allowed to fail concurrently. Otherwise strong consistency cannot 
be guaranteed.

Original comment by [email protected] on 26 Aug 2010 at 10:20

GoogleCodeExporter · 2015-04-08T12:26:35Z

In my test all nodes are ok. The network split is emulated - one node per 
partition. The simplest exapmple from the real file - network switch failute.

Original comment by [email protected] on 26 Aug 2010 at 10:26

GoogleCodeExporter · 2015-04-08T12:26:35Z

> The network split is emulated - one node per partition.
By this case I got the result:

> cs_api_v2:range_read(0,0).
{ok,[{12561922216592930516936087995162401722,2,false,0,0},
     {182703105677062162248623391711046507450,4,false,0,0},
     {267773697407296778114467043568988560314,1,false,0,0},
     {97632513946827546382779739853104454586,3,false,0,0}]}

The original problem reproduce steps emulates network splitting case 1+3.

Original comment by [email protected] on 26 Aug 2010 at 10:33

GoogleCodeExporter · 2015-04-08T12:26:35Z

s/real file/real life/

Original comment by [email protected] on 26 Aug 2010 at 12:17

GoogleCodeExporter · 2015-04-08T12:26:35Z

Scalaris currently cannot correctly handle this situation. When the network 
splits into two partitions, it will successfully repair the ring in both 
partitions. And you will end up with two completely separate rings. In both 
rings you will be able to modify the same item and you will end up with 
inconsistencies when merging again.

We are still considering different approaches to handle such situations.

Original comment by [email protected] on 26 Aug 2010 at 3:59

GoogleCodeExporter · 2015-04-08T12:26:35Z

This issue was closed by revision r1351.

Original comment by [email protected] on 12 Jan 2011 at 6:43

Changed state: Fixed

GoogleCodeExporter · 2015-04-08T12:26:35Z

It is not fixed. Sorry. My commit message for r1351 was wrong.

Original comment by [email protected] on 12 Jan 2011 at 6:54

Changed state: New

GoogleCodeExporter · 2015-04-08T12:26:35Z

Original comment by [email protected] on 12 Jan 2011 at 8:40

Changed state: Accepted

funny-falcon · 2018-05-21T02:05:21Z

Is it fixed?

schintke · 2018-05-25T14:24:11Z

In principal, it is fixed when you configure to use our experimental ring maintenance by setting {leases, true}. If not enough replicas are available, the write request just hangs. When you resume the held nodes, the write finishes successfully.

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Apr 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voluntarism in the capture range of keys in case of a cluster separation can lead to data loss. #59

Voluntarism in the capture range of keys in case of a cluster separation can lead to data loss. #59

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

funny-falcon commented May 21, 2018

schintke commented May 25, 2018

Voluntarism in the capture range of keys in case of a cluster separation can lead to data loss. #59

Voluntarism in the capture range of keys in case of a cluster separation can lead to data loss. #59

Comments

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

GoogleCodeExporter commented Apr 8, 2015

funny-falcon commented May 21, 2018

schintke commented May 25, 2018