KB Article #182513

Monitoring B2Bi shared disk access time

Problem

Some issues related to the B2Bi shared file system access could lead to instability :
- between Integrator and Interchange, it results to SystemThrottle issue, warning message in Interchange logs is :
"Engaging the SystemThrottle. File system health: status is I/O is very slow or blocked for directory"
- between TE and IE and Integrator tasks are stopped automaticaly due to "Forced stop detected" displayed in Integrator trace.


Resolution

To monitor the disk access time, it is possible to use the tools b2bi_diskaccesstime.x4.
Once the similar issue occurred, we will have the shared disk respond time.

Here is an example of a custom script that can be used to monitor the B2BI_SHARED_DATA disk access time every 1 minute.

8<------------------------8<------------------------8<------------------------8<------------------------8<------------------------8<------------------------
#!/bin/bash
cd /opt/axway/Integrator
. ./profile
while true
do
echo "********************************************************" >> /opt/tmp/Result_b2bi_shared_data_diskaccesstime.txt
date >> /opt/tmp/Result_b2bi_shared_data_diskaccesstime.txt
r4edi b2bi_diskaccesstime.x4 $B2BI_SHARED_DATA >> /opt/tmp/Result_b2bi_shared_data_diskaccesstime.txt
sleep 60
done
------------------------>8------------------------>8------------------------>8------------------------>8------------------------>8------------------------>8


1) Copy the content in a file "B2BI_SHARED_DATA_diskaccesstime.sh".
2) Adapt "/opt/axway/Integrator" and "/opt/tmp/" to your environment.
3) Use command "./B2BI_SHARED_DATA_diskaccesstime.sh &" to run it in background.
4) Check the value of synchronized/unsynchronized when the issue occurred.


Extract from B2Bi AdministratorGuide, chapter "I/O management".

Example results

  • > r4edi diskaccess.x4
    SynchronizedUnsynchronized
    B2BI_SHARE_DATA1.220 ms0.085 ms
    CORE_ROOT1.395 ms0.085 ms
    CORE_DATA1.385 ms0.050 ms

Analyze the results

A time (synchronized access time) of <5 ms is a desired value. A synchronized time in the range of 5-10 ms is an acceptable value, but may indicate the need for additional cluster tuning to improve overall performance and reduce communication errors. The unsynchronized access time must be lower than or equal to the synchronized access time.