Discussion:
Riak One partition handoff stall
Gaurav Sood
2018-05-28 07:29:17 UTC
Permalink
Hi All - Good Day!

I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster from
1.4.2 to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever I
leave a node from cluster one partition hand off stalled every time &
Active transfers shows 'waiting to handoff 1 partitions", to complete this
process I need to reboot the riak service on all nodes one by one.

I am not sure if it's configuration problem. Here is the current state of
cluster.

*#output of riak-admin member-status*
================================= Membership
==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
leaving 0.0% -- '***@192.168.2.10'
valid 14.1% -- '***@192.168.2.11'
valid 14.1% -- '***@192.168.2.12'
valid 15.6% -- '***@192.168.2.13'
valid 14.1% -- '***@192.168.2.14'
valid 14.1% -- '***@192.168.2.15'
valid 14.1% -- '***@192.168.2.16'
valid 14.1% -- '***@192.168.2.17'
-------------------------------------------------------------------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0

*#output of riak-admin transfers*

'***@192.168.2.10' waiting to handoff 1 partitions

Active Transfers:

(nothing here)


*#Output of riak-admin ring_status*
================================== Claimant
===================================
Claimant: '***@192.168.2.10'
Status: up
Ring Ready: true

============================== Ownership Handoff
==============================
No pending changes.

============================== Unreachable Nodes
==============================
All nodes are up and reachable

*current Transfer Limit is 2.*

Thanks
Gaurav
Bryan Hunt
2018-05-28 11:27:32 UTC
Permalink
Are you constantly executing a particular riak command, in your system monitoring scripts, for example: `riak-admin vnode-status` ?

What size is your data per server ?

How many objects are you storing ?

---
Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our Privacy Policy.You can update your email preferences or opt-out from receiving Marketing emails here.
Post by Gaurav Sood
Hi All - Good Day!
I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster from 1.4.2 to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever I leave a node from cluster one partition hand off stalled every time & Active transfers shows 'waiting to handoff 1 partitions", to complete this process I need to reboot the riak service on all nodes one by one.
I am not sure if it's configuration problem. Here is the current state of cluster.
#output of riak-admin member-status
================================= Membership ==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0
#output of riak-admin transfers
(nothing here)
#Output of riak-admin ring_status
================================== Claimant ===================================
Status: up
Ring Ready: true
============================== Ownership Handoff ==============================
No pending changes.
============================== Unreachable Nodes ==============================
All nodes are up and reachable
current Transfer Limit is 2.
Thanks
Gaurav
_______________________________________________
riak-users mailing list
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Gaurav Sood
2018-05-28 13:11:06 UTC
Permalink
Thanks Bryan

Below is the ouput of command riak-admin vnode_status. May be data transfer
has stopped on the claimant node.

Output of all commands is constant.

1)

VNode: 342539446249430371453988632667878832731859189760
Backend: riak_kv_eleveldb_backend
Status:
[{stats,<<" Compactions\nLevel Files
Size(MB) Time(sec) Read(MB) Write(MB)\n-------------------
-------------------------------\n 0 1 0 0
0 0\n">>},
{read_block_error,<<"0">>},
{fixed_indexes,true}]


2) 30GB data per server
4) I am not sure about the number of objects. Is there any way to get the
count of objects.
Post by Bryan Hunt
Are you constantly executing a particular riak command, in your system
monitoring scripts, for example: `riak-admin vnode-status` ?
What size is your data per server ?
How many objects are you storing ?
---
Erlang Solutions cares about your data and privacy; please find all
details about the basis for communicating with you and the way we process
your data in our Privacy Policy.You can update your email preferences or
opt-out from receiving Marketing emails here.
Hi All - Good Day!
I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster
from 1.4.2 to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever
I leave a node from cluster one partition hand off stalled every time &
Active transfers shows 'waiting to handoff 1 partitions", to complete this
process I need to reboot the riak service on all nodes one by one.
I am not sure if it's configuration problem. Here is the current state of cluster.
*#output of riak-admin member-status*
================================= Membership
==================================
Status Ring Pending Node
------------------------------------------------------------
-------------------
------------------------------------------------------------
-------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0
*#output of riak-admin transfers*
(nothing here)
*#Output of riak-admin ring_status*
================================== Claimant ==============================
=====
Status: up
Ring Ready: true
============================== Ownership Handoff
==============================
No pending changes.
============================== Unreachable Nodes
==============================
All nodes are up and reachable
*current Transfer Limit is 2.*
Thanks
Gaurav
_______________________________________________
riak-users mailing list
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Nicholas Adams
2018-05-28 13:41:35 UTC
Permalink
Dear Gaurav,

Standard troubleshooting – stalled handoffs can often be fixed by “riak-admin transfer limit 0” to stop all transfers and once you have confirmed that all transfers have stopped, run “riak-admin transfer limit 2” to set it back to the default value.

Another one you might want to investigate is repairing the VNode you list. For Riak KV 1.4.12, you would refer to the steps listed in http://docs.basho.com/riak/1.4.12/ops/running/recovery/repairing-partitions/#Running-a-Repair under Repairing a Single Partition and substituting in the VNode value you have below.

From my work as a CSE under Basho originally and now under TI Tokyo, can I ask why you are regularly getting nodes to leave the cluster? This is not common practice in production environments.

Finally, Riak KV 1.4.12 has been obsolete for quite a few years, I would strongly recommend that you update to LTS status Riak KV 2.0.9 as that is supported as a direct upgrade from 1.4.12 – see https://docs.basho.com/riak/kv/2.0.9/setup/upgrading/ for details. Once on the 2.0.x series, you can then look at a further upgrade to the 2.2.x series should you so wish.

Hope this helps,

Nicholas

From: riak-users <riak-users-***@lists.basho.com> On Behalf Of Gaurav Sood
Sent: 28 May 2018 22:11
To: Bryan Hunt <***@erlang-solutions.com>
Cc: riak-***@lists.basho.com
Subject: Re: Riak One partition handoff stall

Thanks Bryan

Below is the ouput of command riak-admin vnode_status. May be data transfer has stopped on the claimant node.

Output of all commands is constant.

1)

VNode: 342539446249430371453988632667878832731859189760
Backend: riak_kv_eleveldb_backend
Status:
[{stats,<<" Compactions\nLevel Files Size(MB) Time(sec) Read(MB) Write(MB)\n--------------------------------------------------\n 0 1 0 0 0 0\n">>},
{read_block_error,<<"0">>},
{fixed_indexes,true}]


2) 30GB data per server
4) I am not sure about the number of objects. Is there any way to get the count of objects.

On Mon, May 28, 2018 at 4:57 PM, Bryan Hunt <***@erlang-solutions.com<mailto:***@erlang-solutions.com>> wrote:
Are you constantly executing a particular riak command, in your system monitoring scripts, for example: `riak-admin vnode-status` ?

What size is your data per server ?

How many objects are you storing ?

---
Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our Privacy Policy.You can update your email preferences or opt-out from receiving Marketing emails here.


On 28 May 2018, at 08:29, Gaurav Sood <***@mediologysoftware.com<mailto:***@mediologysoftware.com>> wrote:

Hi All - Good Day!

I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster from 1.4.2 to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever I leave a node from cluster one partition hand off stalled every time & Active transfers shows 'waiting to handoff 1 partitions", to complete this process I need to reboot the riak service on all nodes one by one.

I am not sure if it's configuration problem. Here is the current state of cluster.

#output of riak-admin member-status
================================= Membership ==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
leaving 0.0% -- '***@192.168.2.10<mailto:***@192.168.2.10>'
valid 14.1% -- '***@192.168.2.11<mailto:***@192.168.2.11>'
valid 14.1% -- '***@192.168.2.12<mailto:***@192.168.2.12>'
valid 15.6% -- '***@192.168.2.13<mailto:***@192.168.2.13>'
valid 14.1% -- '***@192.168.2.14<mailto:***@192.168.2.14>'
valid 14.1% -- '***@192.168.2.15<mailto:***@192.168.2.15>'
valid 14.1% -- '***@192.168.2.16<mailto:***@192.168.2.16>'
valid 14.1% -- '***@192.168.2.17<mailto:***@192.168.2.17>'
-------------------------------------------------------------------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0
#output of riak-admin transfers

'***@192.168.2.10<mailto:***@192.168.2.10>' waiting to handoff 1 partitions

Active Transfers:

(nothing here)


#Output of riak-admin ring_status
================================== Claimant ===================================
Claimant: '***@192.168.2.10<mailto:***@192.168.2.10>'
Status: up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

current Transfer Limit is 2.

Thanks
Gaurav
_______________________________________________
riak-users mailing list
riak-***@lists.basho.com<mailto:riak-***@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...