WARNING: Not all replicas will be on distinct nodes

Discussion:

Daniel Miller

2017-12-14 19:49:02 UTC

I have a 6 node cluster (now 7) with ring size 128. On adding the most
recent node I got the WARNING: Not all replicas will be on distinct nodes.
After the initial plan I ran the following sequence many times, but always
got the same plan output:

sudo riak-admin cluster clear && \
sleep 10 && \
sudo service riak start && \
sudo riak-admin wait-for-service riak_kv && \
sudo riak-admin cluster join ***@hqriak20.internal && \
sudo riak-admin cluster plan

The plan looked the same every time, and I eventually committed it because
the cluster capacity is running low:

Success: staged join request for '***@riak29.internal' to
'***@riak20.internal'
=============================== Staged Changes
================================
Action Details(s)
-------------------------------------------------------------------------------
join '***@riak29.internal'
-------------------------------------------------------------------------------

NOTE: Applying these changes will result in 1 cluster transition

###############################################################################
After cluster transition 1/1
###############################################################################

================================= Membership
==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 17.2% 14.1% '***@riak20.internal'
valid 17.2% 14.8% '***@riak21.internal'
valid 16.4% 14.1% '***@riak22.internal'
valid 16.4% 14.1% '***@riak23.internal'
valid 16.4% 14.1% '***@riak24.internal'
valid 16.4% 14.8% '***@riak28.internal'
valid 0.0% 14.1% '***@riak29.internal'
-------------------------------------------------------------------------------
Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

WARNING: Not all replicas will be on distinct nodes

Transfers resulting from cluster changes: 18
2 transfers from '***@riak28.internal' to '***@riak29.internal'
3 transfers from '***@riak21.internal' to '***@riak29.internal'
3 transfers from '***@riak23.internal' to '***@riak29.internal'
3 transfers from '***@riak24.internal' to '***@riak29.internal'
4 transfers from '***@riak20.internal' to '***@riak29.internal'
3 transfers from '***@riak22.internal' to '***@riak29.internal'

My understanding is that if some replicas are not on distinct nodes then I
may have permanent data loss if a single physical node is lost (please let
me know if that is not correct). Questions:

How do I diagnose which node(s) have duplicate replicas?
What can I do to fix this situation?

Thanks!
Daniel

P.S. I am unable to get anything useful out of `riak-admin diag`. It
appears to be broken on the version of Riak I'm using (2.2.1). Here's the
output I get:

$ sudo riak-admin diag
RPC to '***@hqriak20.internal' failed: {'EXIT',
{undef,
[{lager,
get_loglevels,
[],[]},

{riaknostic,run,
1,
[{file,

"src/riaknostic.erl"},
{line,118}]},
{rpc,

'-handle_call_call/6-fun-0-',
5,
[{file,
"rpc.erl"},

{line,205}]}]}}

Martin Sumner

2017-12-14 21:27:19 UTC

Permalink

Daniel,

See this post (
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-August/019488.html)
and the links in it for some more details on issues with the core claim
algorithm. The fix is in the pending release 2.2.5 which Russell is adding
the finishing touches to at the moment.

However, the fix may not immediately resolve your problem - the fix is
about preventing this situation, not necessarily about resolving it once it
has been created. Also the issue we saw that would lead to this, would not
(I think) be triggered by adding a single node - unless the cluster already
had the problem. So it is possible, although you are seeing the warning
now, you had the issue when your originally created the cluster, and the
change is just persisting the issue. For instance going from nothing
straight to a 6-node with a ring-size of 128 would create this problem.

As a workaround there is the core claim v3 algorithm which can be turned
on, and you can see if this offers a better cluster plan without
violations. I can't right now remember how to trigger v3 claim algorithm
though - google letting me down.

Ultimately, this may not be such a crisis. The error is through whenever
the cluster cannot guarantee a "target_n_val" of 4. So if you have an
n_val of 3 - you're not necessarily at risk of data loss. To know you
will have to look at your ring via riak attach (see bullet point 2 in
http://docs.basho.com/riak/kv/2.2.3/using/running-a-
cluster/#add-a-second-node-to-your-cluster).

If you can figure out the violations from your ring, you may be able to
resolve by leaving the node that has the violations, and then re-adding it.

Sorry, I'm a bit rushed - but I hope this helps get you started.

Martin

Post by Daniel Miller
I have a 6 node cluster (now 7) with ring size 128. On adding the most
recent node I got the WARNING: Not all replicas will be on distinct nodes.
After the initial plan I ran the following sequence many times, but always
sudo riak-admin cluster clear && \
sleep 10 && \
sudo service riak start && \
sudo riak-admin wait-for-service riak_kv && \
sudo riak-admin cluster plan
The plan looked the same every time, and I eventually committed it because
=============================== Staged Changes
================================
Action Details(s)
------------------------------------------------------------
-------------------
------------------------------------------------------------
-------------------
NOTE: Applying these changes will result in 1 cluster transition
############################################################
###################
After cluster transition 1/1
############################################################
###################
================================= Membership
==================================
Status Ring Pending Node
------------------------------------------------------------
-------------------
------------------------------------------------------------
-------------------
Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
WARNING: Not all replicas will be on distinct nodes
Transfers resulting from cluster changes: 18
My understanding is that if some replicas are not on distinct nodes then I
may have permanent data loss if a single physical node is lost (please let
How do I diagnose which node(s) have duplicate replicas?
What can I do to fix this situation?
Thanks!
Daniel
P.S. I am unable to get anything useful out of `riak-admin diag`. It
appears to be broken on the version of Riak I'm using (2.2.1). Here's the
$ sudo riak-admin diag
{undef,
[{lager,
get_loglevels,
[],[]},
{riaknostic,run,
1,
[{file,
"src/riaknostic.erl"},
{line,118}]},
{rpc,
'-handle_call_call/6-fun-0-',
5,
[{file,
"rpc.erl"},
{line,205}]}]}}
_______________________________________________
riak-users mailing list
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

--
***** Email confidentiality notice *****
This message is private and confidential. If you have received this
message in error, please let us know and remove it from your system.

Daniel Miller

2017-12-14 23:04:21 UTC

Permalink

Thank you, Martin! With that information plus output of `riak-admin cluster
partitions --node=...` I was able to determine which nodes have
preflist/replica problems.

Two nodes in my cluster appear to have problems. Here are the partition
mappings prior to adding the 7th node. I think things will improve after
the 7th node (riak29) has joined the cluster (more on that below).

Partitions owned by '***@riak20.internal':
+---------+-------------------------------------------------+---+
| type | index |id |
+---------+-------------------------------------------------+---+
| primary | 0 | 0 |
| primary | 68507889249886074290797726533575766546371837952 | 6 |
| primary |137015778499772148581595453067151533092743675904 |12 |
...
| primary |1370157784997721485815954530671515330927436759040|120|
| primary |1438665674247607560106752257205091097473808596992|126|
|secondary| -- |-- |
| stopped | -- |-- |
+---------+-------------------------------------------------+---+

Partitions owned by '***@riak21.internal':
+---------+-------------------------------------------------+---+
| type | index |id |
+---------+-------------------------------------------------+---+
| primary | 11417981541647679048466287755595961091061972992 | 1 |
| primary | 79925870791533753339264014289171727637433810944 | 7 |
...
| primary |1381575766539369164864420818427111292018498732032|121|
| primary |1450083655789255239155218544960687058564870569984|127|
|secondary| -- |-- |
| stopped | -- |-- |
+---------+-------------------------------------------------+---+

Partitions owned by '***@riak29.internal':
+---------+-------------------------------------------------+---+
| type | index |id |
+---------+-------------------------------------------------+---+
| primary | -- |-- |
|secondary| 22835963083295358096932575511191922182123945984 | 2 |
|secondary| 68507889249886074290797726533575766546371837952 | 6 |
...
|secondary|765004763290394496247241279624929393101152190464 |67 |
|secondary|1438665674247607560106752257205091097473808596992|126|
| stopped | -- |-- |
+---------+-------------------------------------------------+---+

Here's my understanding of how the preflists work:
riak20 has vnode 126 -> preflist: 126, 127, 0 -> (riak20, riak21, riak20)
two distinct nodes = bad
riak21 has vnode 127 -> preflist: 127, 0, 1 -> (riak21, riak20, riak21) two
distinct nodes = bad

After adding one new node I believe the problem on riak20 will go away
because riak29 will become the new primary for 126, so the new preflist
will be

riak29: vnode 126 -> preflist: 126, 127, 0 -> (riak29, riak21, riak20)
three distinct nodes = good

However, vnodes 1 and 127 will remain on riak21, so that's still a problem.
Hopefully adding another node to the cluster will resolve that issue. If
that does not do it I'll try switching to claim v3.

If my logic is correct, then you are also correct: it appears this problem
has been present since a previous cluster transition. I reviewed the logs
from the previous transition and I did not get "WARNING: Not all replicas
will be on distinct nodes" after the final set of transfers, which
incidentally involved riak21 (although it did appear earlier in the plan).
Is it true that if that warning appears anywhere in the plan then it's a
bad plan?

Thank you again for the quick response!
Daniel

On Thu, Dec 14, 2017 at 4:27 PM, Martin Sumner <

Post by Martin Sumner
Daniel,
See this post (http://lists.basho.com/pipermail/riak-users_lists.
basho.com/2017-August/019488.html) and the links in it for some more
details on issues with the core claim algorithm. The fix is in the pending
release 2.2.5 which Russell is adding the finishing touches to at the
moment.
However, the fix may not immediately resolve your problem - the fix is
about preventing this situation, not necessarily about resolving it once it
has been created. Also the issue we saw that would lead to this, would not
(I think) be triggered by adding a single node - unless the cluster already
had the problem. So it is possible, although you are seeing the warning
now, you had the issue when your originally created the cluster, and the
change is just persisting the issue. For instance going from nothing
straight to a 6-node with a ring-size of 128 would create this problem.
As a workaround there is the core claim v3 algorithm which can be turned
on, and you can see if this offers a better cluster plan without
violations. I can't right now remember how to trigger v3 claim algorithm
though - google letting me down.
Ultimately, this may not be such a crisis. The error is through whenever
the cluster cannot guarantee a "target_n_val" of 4. So if you have an
n_val of 3 - you're not necessarily at risk of data loss. To know you
will have to look at your ring via riak attach (see bullet point 2 in
http://docs.basho.com/riak/kv/2.2.3/using/running-a-clust
er/#add-a-second-node-to-your-cluster).
If you can figure out the violations from your ring, you may be able to
resolve by leaving the node that has the violations, and then re-adding it.
Sorry, I'm a bit rushed - but I hope this helps get you started.
Martin

Post by Daniel Miller
I have a 6 node cluster (now 7) with ring size 128. On adding the most
recent node I got the WARNING: Not all replicas will be on distinct nodes.
After the initial plan I ran the following sequence many times, but always
sudo riak-admin cluster clear && \
sleep 10 && \
sudo service riak start && \
sudo riak-admin wait-for-service riak_kv && \
sudo riak-admin cluster plan
The plan looked the same every time, and I eventually committed it
=============================== Staged Changes
================================
Action Details(s)
------------------------------------------------------------
-------------------
------------------------------------------------------------
-------------------
NOTE: Applying these changes will result in 1 cluster transition
############################################################
###################
After cluster transition 1/1
############################################################
###################
================================= Membership
==================================
Status Ring Pending Node
------------------------------------------------------------
-------------------
------------------------------------------------------------
-------------------
Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
WARNING: Not all replicas will be on distinct nodes
Transfers resulting from cluster changes: 18
My understanding is that if some replicas are not on distinct nodes then
I may have permanent data loss if a single physical node is lost (please
How do I diagnose which node(s) have duplicate replicas?
What can I do to fix this situation?
Thanks!
Daniel
P.S. I am unable to get anything useful out of `riak-admin diag`. It
appears to be broken on the version of Riak I'm using (2.2.1). Here's the
$ sudo riak-admin diag
{undef,
[{lager,
get_loglevels,
[],[]},
{riaknostic,run,
1,
[{file,
"src/riaknostic.erl"},
{line,118}]},
{rpc,
'-handle_call_call/6-fun-0-',
5,
[{file,
"rpc.erl"},
{line,205}]}]}}
_______________________________________________
riak-users mailing list
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

***** Email confidentiality notice *****
This message is private and confidential. If you have received this
message in error, please let us know and remove it from your system.

Martin Sumner

2017-12-14 23:56:23 UTC

Permalink

Yes, this looks like the standard tail-wrapping problem. You are right
that adding an eight node should resolve this, and if the standard code
doesn't resolve it, running plans with v3 will eventually resolve it.

As for when the warning would appear, I'm not sure I really know when the
warning would appear during the transfers - but it should have definitely
appeared at the point you issued the "cluster plan" command on the previous
transition. Perhaps I've misunderstood your question though.

Post by Daniel Miller
Thank you, Martin! With that information plus output of `riak-admin
cluster partitions --node=...` I was able to determine which nodes have
preflist/replica problems.
Two nodes in my cluster appear to have problems. Here are the partition
mappings prior to adding the 7th node. I think things will improve after
the 7th node (riak29) has joined the cluster (more on that below).
+---------+-------------------------------------------------+---+
| type | index |id |
+---------+-------------------------------------------------+---+
| primary | 0 | 0 |
| primary | 68507889249886074290797726533575766546371837952 | 6 |
| primary |137015778499772148581595453067151533092743675904 |12 |
...
| primary |1370157784997721485815954530671515330927436759040|120|
| primary |1438665674247607560106752257205091097473808596992|126|
|secondary| -- |-- |
| stopped | -- |-- |
+---------+-------------------------------------------------+---+
+---------+-------------------------------------------------+---+
| type | index |id |
+---------+-------------------------------------------------+---+
| primary | 11417981541647679048466287755595961091061972992 | 1 |
| primary | 79925870791533753339264014289171727637433810944 | 7 |
...
| primary |1381575766539369164864420818427111292018498732032|121|
| primary |1450083655789255239155218544960687058564870569984|127|
|secondary| -- |-- |
| stopped | -- |-- |
+---------+-------------------------------------------------+---+
+---------+-------------------------------------------------+---+
| type | index |id |
+---------+-------------------------------------------------+---+
| primary | -- |-- |
|secondary| 22835963083295358096932575511191922182123945984 | 2 |
|secondary| 68507889249886074290797726533575766546371837952 | 6 |
...
|secondary|765004763290394496247241279624929393101152190464 |67 |
|secondary|1438665674247607560106752257205091097473808596992|126|
| stopped | -- |-- |
+---------+-------------------------------------------------+---+
riak20 has vnode 126 -> preflist: 126, 127, 0 -> (riak20, riak21, riak20)
two distinct nodes = bad
riak21 has vnode 127 -> preflist: 127, 0, 1 -> (riak21, riak20, riak21)
two distinct nodes = bad
After adding one new node I believe the problem on riak20 will go away
because riak29 will become the new primary for 126, so the new preflist
will be
riak29: vnode 126 -> preflist: 126, 127, 0 -> (riak29, riak21, riak20)
three distinct nodes = good
However, vnodes 1 and 127 will remain on riak21, so that's still a
problem. Hopefully adding another node to the cluster will resolve that
issue. If that does not do it I'll try switching to claim v3.
If my logic is correct, then you are also correct: it appears this problem
has been present since a previous cluster transition. I reviewed the logs
from the previous transition and I did not get "WARNING: Not all replicas
will be on distinct nodes" after the final set of transfers, which
incidentally involved riak21 (although it did appear earlier in the plan).
Is it true that if that warning appears anywhere in the plan then it's a
bad plan?
Thank you again for the quick response!
Daniel
On Thu, Dec 14, 2017 at 4:27 PM, Martin Sumner <

Post by Martin Sumner
Daniel,
See this post (http://lists.basho.com/pipermail/riak-users_lists.basho.
com/2017-August/019488.html) and the links in it for some more details
on issues with the core claim algorithm. The fix is in the pending release
2.2.5 which Russell is adding the finishing touches to at the moment.
However, the fix may not immediately resolve your problem - the fix is
about preventing this situation, not necessarily about resolving it once it
has been created. Also the issue we saw that would lead to this, would not
(I think) be triggered by adding a single node - unless the cluster already
had the problem. So it is possible, although you are seeing the warning
now, you had the issue when your originally created the cluster, and the
change is just persisting the issue. For instance going from nothing
straight to a 6-node with a ring-size of 128 would create this problem.
As a workaround there is the core claim v3 algorithm which can be turned
on, and you can see if this offers a better cluster plan without
violations. I can't right now remember how to trigger v3 claim algorithm
though - google letting me down.
Ultimately, this may not be such a crisis. The error is through whenever
the cluster cannot guarantee a "target_n_val" of 4. So if you have an
n_val of 3 - you're not necessarily at risk of data loss. To know you
will have to look at your ring via riak attach (see bullet point 2 in
http://docs.basho.com/riak/kv/2.2.3/using/running-a-clust
er/#add-a-second-node-to-your-cluster).
If you can figure out the violations from your ring, you may be able to
resolve by leaving the node that has the violations, and then re-adding it.
Sorry, I'm a bit rushed - but I hope this helps get you started.
Martin

Post by Daniel Miller
I have a 6 node cluster (now 7) with ring size 128. On adding the most
recent node I got the WARNING: Not all replicas will be on distinct nodes.
After the initial plan I ran the following sequence many times, but always
sudo riak-admin cluster clear && \
sleep 10 && \
sudo service riak start && \
sudo riak-admin wait-for-service riak_kv && \
sudo riak-admin cluster plan
The plan looked the same every time, and I eventually committed it
=============================== Staged Changes
================================
Action Details(s)
------------------------------------------------------------
-------------------
------------------------------------------------------------
-------------------
NOTE: Applying these changes will result in 1 cluster transition
############################################################
###################
After cluster transition 1/1
############################################################
###################
================================= Membership
==================================
Status Ring Pending Node
------------------------------------------------------------
-------------------
------------------------------------------------------------
-------------------
Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
WARNING: Not all replicas will be on distinct nodes
Transfers resulting from cluster changes: 18
My understanding is that if some replicas are not on distinct nodes then
I may have permanent data loss if a single physical node is lost (please
How do I diagnose which node(s) have duplicate replicas?
What can I do to fix this situation?
Thanks!
Daniel
P.S. I am unable to get anything useful out of `riak-admin diag`. It
appears to be broken on the version of Riak I'm using (2.2.1). Here's the
$ sudo riak-admin diag
{undef,
[{lager,
get_loglevels,
[],[]},
{riaknostic,run,
1,
[{file,
"src/riaknostic.erl"},
{line,118}]},
{rpc,
'-handle_call_call/6-fun-0-',
5,
[{file,
"rpc.erl"},
{line,205}]}]}}
_______________________________________________
riak-users mailing list
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

***** Email confidentiality notice *****
This message is private and confidential. If you have received this
message in error, please let us know and remove it from your system.

--
***** Email confidentiality notice *****
This message is private and confidential. If you have received this
message in error, please let us know and remove it from your system.