Daniel Miller
2017-12-14 19:49:02 UTC
I have a 6 node cluster (now 7) with ring size 128. On adding the most
recent node I got the WARNING: Not all replicas will be on distinct nodes.
After the initial plan I ran the following sequence many times, but always
got the same plan output:
sudo riak-admin cluster clear && \
sleep 10 && \
sudo service riak start && \
sudo riak-admin wait-for-service riak_kv && \
sudo riak-admin cluster join ***@hqriak20.internal && \
sudo riak-admin cluster plan
The plan looked the same every time, and I eventually committed it because
the cluster capacity is running low:
Success: staged join request for '***@riak29.internal' to
'***@riak20.internal'
=============================== Staged Changes
================================
Action Details(s)
-------------------------------------------------------------------------------
join '***@riak29.internal'
-------------------------------------------------------------------------------
NOTE: Applying these changes will result in 1 cluster transition
###############################################################################
After cluster transition 1/1
###############################################################################
================================= Membership
==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 17.2% 14.1% '***@riak20.internal'
valid 17.2% 14.8% '***@riak21.internal'
valid 16.4% 14.1% '***@riak22.internal'
valid 16.4% 14.1% '***@riak23.internal'
valid 16.4% 14.1% '***@riak24.internal'
valid 16.4% 14.8% '***@riak28.internal'
valid 0.0% 14.1% '***@riak29.internal'
-------------------------------------------------------------------------------
Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
WARNING: Not all replicas will be on distinct nodes
Transfers resulting from cluster changes: 18
2 transfers from '***@riak28.internal' to '***@riak29.internal'
3 transfers from '***@riak21.internal' to '***@riak29.internal'
3 transfers from '***@riak23.internal' to '***@riak29.internal'
3 transfers from '***@riak24.internal' to '***@riak29.internal'
4 transfers from '***@riak20.internal' to '***@riak29.internal'
3 transfers from '***@riak22.internal' to '***@riak29.internal'
My understanding is that if some replicas are not on distinct nodes then I
may have permanent data loss if a single physical node is lost (please let
me know if that is not correct). Questions:
How do I diagnose which node(s) have duplicate replicas?
What can I do to fix this situation?
Thanks!
Daniel
P.S. I am unable to get anything useful out of `riak-admin diag`. It
appears to be broken on the version of Riak I'm using (2.2.1). Here's the
output I get:
$ sudo riak-admin diag
RPC to '***@hqriak20.internal' failed: {'EXIT',
{undef,
[{lager,
get_loglevels,
[],[]},
{riaknostic,run,
1,
[{file,
"src/riaknostic.erl"},
{line,118}]},
{rpc,
'-handle_call_call/6-fun-0-',
5,
[{file,
"rpc.erl"},
{line,205}]}]}}
recent node I got the WARNING: Not all replicas will be on distinct nodes.
After the initial plan I ran the following sequence many times, but always
got the same plan output:
sudo riak-admin cluster clear && \
sleep 10 && \
sudo service riak start && \
sudo riak-admin wait-for-service riak_kv && \
sudo riak-admin cluster join ***@hqriak20.internal && \
sudo riak-admin cluster plan
The plan looked the same every time, and I eventually committed it because
the cluster capacity is running low:
Success: staged join request for '***@riak29.internal' to
'***@riak20.internal'
=============================== Staged Changes
================================
Action Details(s)
-------------------------------------------------------------------------------
join '***@riak29.internal'
-------------------------------------------------------------------------------
NOTE: Applying these changes will result in 1 cluster transition
###############################################################################
After cluster transition 1/1
###############################################################################
================================= Membership
==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 17.2% 14.1% '***@riak20.internal'
valid 17.2% 14.8% '***@riak21.internal'
valid 16.4% 14.1% '***@riak22.internal'
valid 16.4% 14.1% '***@riak23.internal'
valid 16.4% 14.1% '***@riak24.internal'
valid 16.4% 14.8% '***@riak28.internal'
valid 0.0% 14.1% '***@riak29.internal'
-------------------------------------------------------------------------------
Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
WARNING: Not all replicas will be on distinct nodes
Transfers resulting from cluster changes: 18
2 transfers from '***@riak28.internal' to '***@riak29.internal'
3 transfers from '***@riak21.internal' to '***@riak29.internal'
3 transfers from '***@riak23.internal' to '***@riak29.internal'
3 transfers from '***@riak24.internal' to '***@riak29.internal'
4 transfers from '***@riak20.internal' to '***@riak29.internal'
3 transfers from '***@riak22.internal' to '***@riak29.internal'
My understanding is that if some replicas are not on distinct nodes then I
may have permanent data loss if a single physical node is lost (please let
me know if that is not correct). Questions:
How do I diagnose which node(s) have duplicate replicas?
What can I do to fix this situation?
Thanks!
Daniel
P.S. I am unable to get anything useful out of `riak-admin diag`. It
appears to be broken on the version of Riak I'm using (2.2.1). Here's the
output I get:
$ sudo riak-admin diag
RPC to '***@hqriak20.internal' failed: {'EXIT',
{undef,
[{lager,
get_loglevels,
[],[]},
{riaknostic,run,
1,
[{file,
"src/riaknostic.erl"},
{line,118}]},
{rpc,
'-handle_call_call/6-fun-0-',
5,
[{file,
"rpc.erl"},
{line,205}]}]}}