KB Article #175987
Engaging the SystemThrottle, status is I/O is very slow or blocked: maximum allowable is 15s
Problem
When running in cluster mode (hadn't tested it in single node mode) and using NFS4 for the share and the system is under high load at a point the system slows down dramatically. Also the nodes in the UI will alternate the statuses either "Starting" or "Stopping". Or either one node will remain in started and the other will not start at all.
Check the /var/log/messages file for the following error message:
Apr 14 11:02:53 b2b-aa-n1 kernel: nfs4_reclaim_open_state: Lock reclaim failed!
As the share is also used by Integrator, you will also see many errors in the Integrator traceviewer, like:
failed to connect to timer server 'timer_5100': TIMER: failed to read reply from timer server: disconnected
The full error from Interchange is:
2014-04-14 02:36:47,920 - WARN [SystemThrottle] (SystemThrottle.run:213) - Engaging the SystemThrottle, status is I/O is very slow or blocked for directory /share/B2Bi/Interchang
e/common/data/backup; Write operation for b2b_aa_n1_cn.txt has been in progress for 15s 666ms; maximum allowable is 15s
2014-04-14 02:37:10,537 - ERROR [Cluster Thread 9] (IntegratorHostController.membershipChange:302) - Error sending cluster membership to Integrator API
b2bx.server.B2BXException: b2bx.server.B2BXException: java.net.SocketTimeoutException: Read timed out
at b2bx.server.B2BXServer.teMembershipChangeNotification(B2BXServer.java:230)
at b2bx.server.B2BXServer.teMembershipChangeNotification(B2BXServer.java:190)
at com.axway.b2bi.cluster.IntegratorHostController.membershipChange(IntegratorHostController.java:298)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.axway.cluster.singleton.SingletonInvocationMessage.execute(SingletonInvocationMessage.java:47)
at com.axway.cluster.messaging.MessageExecutionWrapper.run(MessageExecutionWrapper.java:32)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
at com.axway.cluster.extensions.thread.EventedThread.primRun(EventedThread.java:103)
at com.axway.cluster.extensions.thread.EventedThread.run(EventedThread.java:81)
Caused by: b2bx.server.B2BXException: java.net.SocketTimeoutException: Read timed out
at b2bx.server.B2BXServer.sendReceive(B2BXServer.java:1829)
at b2bx.server.B2BXServer.teMembershipChangeNotification(B2BXServer.java:227)
... 13 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at b2bx.server.B2BXServer.readReply(B2BXServer.java:1907)
at b2bx.server.B2BXServer.sendReceive(B2BXServer.java:1822)
... 14 more
Resolution
This is caused by a bug in the Linux kernel (also attached as an mht file to this kb):https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=732748
An update of the kernel is needed:
kernel-2.6.43.5 or higher
kernel-3.3.5 or higher