KB Article #182513
Monitoring B2Bi shared disk access time
Problem
Some issues related to the B2Bi shared file system access could lead to instability :
- between Integrator and Interchange, it results to SystemThrottle issue, warning message in Interchange logs is :
"Engaging the SystemThrottle. File system health: status is I/O is very slow or blocked for directory"
- between TE and IE and Integrator tasks are stopped automaticaly due to "Forced stop detected" displayed in Integrator trace.
Resolution
To monitor the disk access time, it is possible to use the tools b2bi_diskaccesstime.x4.
Once the similar issue occurred, we will have the shared disk respond time.
Here is an example of a custom script that can be used to monitor the B2BI_SHARED_DATA disk access time every 1 minute.
8<------------------------8<------------------------8<------------------------8<------------------------8<------------------------8<------------------------
#!/bin/bash
cd /opt/axway/Integrator
. ./profile
while true
do
echo "********************************************************" >> /opt/tmp/Result_b2bi_shared_data_diskaccesstime.txt
date >> /opt/tmp/Result_b2bi_shared_data_diskaccesstime.txt
r4edi b2bi_diskaccesstime.x4 $B2BI_SHARED_DATA >> /opt/tmp/Result_b2bi_shared_data_diskaccesstime.txt
sleep 60
done
------------------------>8------------------------>8------------------------>8------------------------>8------------------------>8------------------------>8
1) Copy the content in a file "B2BI_SHARED_DATA_diskaccesstime.sh".
2) Adapt "/opt/axway/Integrator" and "/opt/tmp/" to your environment.
3) Use command "./B2BI_SHARED_DATA_diskaccesstime.sh &" to run it in background.
4) Check the value of synchronized/unsynchronized when the issue occurred.
Extract from B2Bi AdministratorGuide, chapter "I/O management".
Example results
> r4edi diskaccess.x4 Synchronized Unsynchronized B2BI_SHARE_DATA1.220 ms 0.085 ms CORE_ROOT1.395 ms 0.085 ms CORE_DATA1.385 ms 0.050 ms
Analyze the results
A time (synchronized access time) of <5 ms is a desired value. A synchronized time in the range of 5-10 ms is an acceptable value, but may indicate the need for additional cluster tuning to improve overall performance and reduce communication errors. The unsynchronized access time must be lower than or equal to the synchronized access time.