KB Article #191062

SecureTransport 5.5 Tuning - Standard Cluster with PostgreSQL database

Problem

There are many configuration parameters that can be adjusted in SecureTransport and they are spread among many files as well as stored in the DB.


This article aims to help with tuning SecureTransport 5.5 Standard Cluster with PostgreSQL database and finding the necessary place to apply a configuration change.


It's important to keep in mind that tuning is a constantly evolving process in which you establish a set of baselines and optimal settings through repetitive testing and evaluation. There is no definitive guide or a magic set of options, you are responsible for evaluating performance, making incremental changes, and re-evaluating until you reach your goals.


Table of contents




1. Memory Tuning


All protocol daemons have a minimum and a maximum Heap Size value defined by the JAVA_MEM_MIN and JAVA_MEM_MAX parameters. The configuration options are available in the startup scripts start_* located in the $FILEDRIVEHOME/bin folder.


The startup scripts contain default values and must not be edited there. Instead, use global configuration file STStartScriptsConfig, located in the $FILEDRIVEHOME/conf folde, which would allow you to set JAVA_MEM_MIN, JAVA_MEM_MAX, and JAVA_OPTS parameters, so the changes survive when SecureTransport is upgraded. Additional details on this file and its configuration can be found in the Advanced protocol server configuration section of the Admin Guide, available in our Docs portal.


It's important to keep in mind that values provided in the "Advanced service configuration and memory allocation" section of SecureTransport 5.5 Administrator Guide must be treated as example values and not as recommended values.


The standard cluster with PostgreSQL provides additional challenges for memory tuning. The PostgreSQL database needs more RAM because all nodes are working with the primary database, which is then replicated to databases on secondary nodes. To simplify the process, we will split the system RAM memory in two. One half will be used for the ST application java processes. The other half will remain for the OS and the DB. After the tuning the average daily usage of RAM must be between 60% and 80% of the system memory, and during the busy hours the average used RAM must not exceed 90% of the system memory.


WARNING: The actual memory usage of a given daemon can exceed the value defined for Max Heap Size. This is due to the way a JVM works, thus one must be cautious not to exhaust the RAM memory available on a given server.


Example values for protocol daemons that would cover most use cases:


JAVA_MEM_MIN="1G"
JAVA_MEM_MAX="2G"


The following table shows example values when all protocol daemons in ST are configured and running. It takes into account the constraints for RAM memory available on the server, based on minimum hardware requirements (System requirements), and a typical configuration with increased RAM and commonly used protocols.


Component Core - 16GB RAM Edge - 8GB RAM Core - 24+GB RAM Edge - 16+GB RAM
JAVA_MEM_MIN JAVA_MEM_MAX JAVA_MEM_MIN JAVA_MEM_MAX JAVA_MEM_MIN JAVA_MEM_MAX JAVA_MEM_MIN JAVA_MEM_MAX
Admin 512M 1G 256M 512M 1G 2G 512M 1G
AS2d 256M 512M 256M 512M 512M 1G 512M 1G
FTPd 256M 512M 256M 512M 512M 1G 512M 1G
HTTPd 512M 1G 256M 512M 512M 1G 512M 1G
PeSITd 256M 512M 256M 512M 512M 1G 512M 1G
SSHd 1G 2G 512M 1G 1G 3G 1G 3G
TM 2G 4G - - 4G 8G - -
monitord 256M 512M 256M 512M 256M 512M 256M 512M
socks - - 256M 512M - - 512M 1G


WARNING: Given the nature of SecureTransport, one cannot easily determine how much memory will be needed on a given environment. After performing an initial tuning, it is recommended to monitor the actual usage of any protocol of interest and then adjust accordingly.


Additional notes:


  • The golden rule for allocating memory for JAVA processes is JAVA_MEM_MIN is a half of JAVA_MEM_MAX.
  • TM is the brain of ST. TM requires more memory, and the allocation is more dynamic. In some configurations it is better to set JAVA_MEM_MIN closer to JAVA_MEM_MAX, for example: 6G/8G. It is recommended to enable GC logging for TM in the STStartScriptsConfig script (see below). Check how to enable the timestamps in custom Garbage Collector logs with Java 11 in article KB 182225.
  • When tuning the memory for the Admin Service on Core servers, one must take into consideration how many administrators would be using the service at a given time. Also, what types of Administrators - Full or Delegated. Delegated administrators consume more memory when doing File Tracking searches (one of the most memory consuming operations).
  • To improve speed for CIT SSH transfers over high bandwidth high latency networks increase the receive and send buffers plus the sliding window size (see below).
  • The command line tools for import/export of account information in XML format create a separate JVM for the duration of execution (see below).


Here is how a generic configuration of STStartScriptsConfig script will look like based on the minimum hardware requirements and the additional notes above. You can find more variants with common configurations attached to this article.


#Admin memory settings
ADMIN_JAVA_MEM_MIN="512M"
ADMIN_JAVA_MEM_MAX="1G"
#
#AS2d memory settings
AS2_JAVA_MEM_MIN="256M"
AS2_JAVA_MEM_MAX="512M"
#
#FTPD memory settings
FTP_JAVA_MEM_MIN="256M"
FTP_JAVA_MEM_MAX="512M"
#
#HTTPD memory settings
HTTP_JAVA_MEM_MIN="512M"
HTTP_JAVA_MEM_MAX="1G"
#
#PeSITD memory settings
PESIT_JAVA_MEM_MIN="256M"
PESIT_JAVA_MEM_MAX="512M"
#
#SSHD memory settings and buffers tuning 
SSH_JAVA_MEM_MIN="1G"
SSH_JAVA_MEM_MAX="2G"
SSH_JAVA_OPTS="-DrecvBufferSize=1048576 $SSH_JAVA_OPTS"
SSH_JAVA_OPTS="-DsendBufferSize=1048576 $SSH_JAVA_OPTS"
SSH_JAVA_OPTS="-Dssh.maxWindowSpace=12582912 $SSH_JAVA_OPTS"
#
#TM memory settings
TM_JAVA_MEM_MIN="2G"
TM_JAVA_MEM_MAX="4G"
#TM GC tuning and logging
TM_JAVA_OPTS="-XX:+ExplicitGCInvokesConcurrent $TM_JAVA_OPTS"
GC_LOGGING=true
NumberOfGCLogFiles=30
GCLogFileSize=5000K
#
#Monitord memory settings
MONITORD_JAVA_MEM_MIN="256M"
MONITORD_JAVA_MEM_MAX="512M"
#
#Socks memory settings
SOCKS_JAVA_MEM_MIN="256M"
SOCKS_JAVA_MEM_MAX="512M"
#
# xml_import and xml_export scripts
XML_JAVA_MEM_MIN="256M"
XML_JAVA_MEM_MAX="512M"


More information on monitoring JVM memory: KB 176359 and KB 180171.




2. Database Tuning


c3p0 in configuration.xml

The configuration changes are to be made to the hibernate.c3p0.min_size and hibernate.c3p0.max_size parameters for each component.


The PostgreSQL database uses memory buffers (work_mem) per connection for query operations such as a sort (ORDER BY, DISTINCT, and merge joins) or hash table (used in hash joins, hash-based aggregation) before writing to temporary disk files. These buffers help to speed up complex queries, but they can eat a lot of memory if you have many connections. The best approach is to use less connections from ST side but with reasonable size of the buffers.

If the environment is upgraded from the legacy Standard Cluster with MariaDB and the original tuning guide was used KB 178443 the values for hibernate.c3p0.min_size and hibernate.c3p0.max_size parameters must be decreased as per the following table.


Component c3p0.min_size c3p0.max_size
Database 2 50
Database_FTPDComponent 5 50
Database_HTTPDComponent 5 50
Database_TransactionManagerComponent 20 100
Database_AS2Component 5 50
Database_SSHDComponent 5 50
Database_ServerLogComponent 5 50
Database_InstallerComponent 2 20
Database_AdminComponent 20 50
Database_ToolsComponent 1 20
Database_PesitComponent 5 50
Database_SharedRuntimeComponent 5 32
Database_TransferLogComponent 5 50




Embeded PostgreSQL database

Changes are to be made to $FILEDRIVEHOME/var/db/postgresql/data/postgresql.conf


Max Connections

The maximum number of concurrent connections the database server will allow. In a Standard Cluster with PostgreSQL, all nodes will access the configuration and file tracking data in the primary database and each node will write the server logs in its own local database. Based on the recommended number of connections from the table above, here are the recommended values for different types of deployments.


Parameter Core servers Edge Servers
Standalone 2-node cluster 3-node cluster Standalone 2-node cluster 3-node cluster
max_connections 1000 2000 2500 1000 1500 2000


A generic approach is to configure max_connections = 2500 for Core servers and max_connections = 2000 for Edge servers.


Note that Axway has validated successful deployments with up to 4 SecureTransport Edges in synchronization (cluster).


Max open files

The maximum number of simultaneously open files allowed for each server subprocess. The default is one thousand files. To be on the safe side, with partitioned tables for file tracking and server logs it is recommended to increase the value to five thousand files. This requires increasing the system-wide ulimit for the PostgreSQL user on Linux installations to 131072.


max_files_per_process = 5000




RESOURCE USAGE (except WAL)

Memory

  • shared_buffers - The amount of memory the database server uses for shared memory buffers. Recommended size is 15% to 25% of the machine's total RAM for standalone DB server. The Postgre default is 128 megabytes (128MB). A value that would cover most use cases is from 1GB to 4GB (see table below).
  • huge_pages - Controls whether huge pages are requested for the main shared memory area. With huge_pages set to try (default), the server will try to request huge pages, but fall back to the default if that fails. Huge pages are known as large pages on Windows. Some internet resources recommend turning it off especially on Windows. Postgre recommendation is if available use it. It reduces CPU usage and memory fragmentation.
  • temp_buffers - These are session-local buffers used only for access to temporary tables. The Postgre default is eight megabytes (8MB). Use the default.
  • work_mem - The base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files. The formula to calculate the size for standalone DB server is: Total RAM * 0.25 / max_connections. The Postgre default value is four megabytes (4MB). A value that would cover most use cases is 8MB. Higher values imply a risk for OS performance degradation and memory fragmentation.
  • maintenance_work_mem - The maximum amount of memory to be used by maintenance operations like vacuum, create index, and alter table add foreign key operations. A maximum of 2GB is enough for all use cases even when you have a lot of available RAM memory. On Windows there is a limitation to be less than 2GB. So, generally the max value is maintenance_work_mem = 2047MB.


Parameter Core servers Edge Servers
16GB RAM 24GB RAM 32GB RAM 48GB RAM 64GB RAM 8GB RAM 16GB RAM 24GB RAM 32GB RAM 48GB RAM
shared_buffers 1GB 1GB or 2GB 2GB or 3GB 4GB 6GB 512MB 1GB 1GB or 2GB 2GB or 3GB 4GB
huge_pages try try try try try off try try try try
temp_buffers 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB
work_mem 4MB 8MB 8MB 8MB 8MB 2MB 4MB 8MB 8MB 8MB
maintenance_work_mem 384MB 512MB 1GB 1536MB 2047MB 320MB 384MB 512MB 1GB 1536MB


Asynchronous Behavior

  • effective_io_concurrency - The number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. The default is 1 on supported systems. Currently, this setting only affects bitmap heap scans. It requires the posix_fadvise function, which is not available on Windows. On Linux with SSD disks recommended value is 200. On Windows use 0 or leave commented.
  • maintenance_io_concurrency - Similar to effective_io_concurrency, but used for maintenance work that is done on behalf of many client sessions. The default is 10 on supported systems. On Linux use the default value 10 or leave commented. On Windows use 0 or leave commented.
  • max_worker_processes - The maximum number of background processes that the system can support. Set as equal to number of CPUs, but min 4 and max 16.
  • max_parallel_workers_per_gather - The maximum number of workers that can be started by a single Gather or Gather Merge node. Parallel workers are taken from the pool of processes established by max_worker_processes, limited by max_parallel_workers. Set to half of max_worker_processes.
  • max_parallel_maintenance_workers - The maximum number of parallel workers that can be started by a single utility command. Currently, the parallel utility commands that support the use of parallel workers are CREATE INDEX only when building a B-tree index, and VACUUM without FULL option. Parallel workers are taken from the pool of processes established by max_worker_processes, limited by max_parallel_workers. Set to half of max_worker_processes.
  • max_parallel_workers - The maximum number of workers that the system can support for parallel operations. Taken from the pool of worker processes. Set to the same number as max_worker_processes.




WRITE-AHEAD LOG

  • wal_level - determines how much information is written to the WAL. The default value is replica, which writes enough data to support WAL archiving and replication, including running read-only queries on a standby server. For standard cluster set to logical.
  • wal_buffers - The amount of shared memory used for WAL data that has not yet been written to disk. When value is -1 it is calculated to shared_buffers/32. Set the value to 16MB.
  • max_wal_size - Maximum size to let the WAL grow during automatic checkpoints. This is a soft limit; WAL size can exceed max_wal_size under special circumstances, such as heavy load, a failing archive_command or archive_library, or a high wal_keep_size setting. Set the value to 4GB.
  • min_wal_size - Minimum size to ensure that enough WAL space is reserved to handle spikes in WAL usage, for example when running large batch jobs. As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use at a checkpoint, rather than removed. Set the value to 1GB.




QUERY TUNING

  • random_page_cost - Sets the planner's estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0. Storage that has a low random read cost relative to sequential (SSD), might also be better modeled with a lower value random_page_cost = 1.1.
  • effective_cache_size - Sets the planner's assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. The default is 4 gigabytes (4GB). A value that would cover most use cases is from 4GB to 8GB.




REPLICATION

These settings are applicable only for cluster environments. For standalone server installations please ignore.


  • max_wal_senders - Specifies the maximum number of concurrent connections from standby servers or streaming base backup clients. A logical replication subscription needs one connection. One or more is needed for table synchronization. ST core servers create 3 subscriptions in each direction. Set the value to 20.
  • max_replication_slots - Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys. A logical replication subscription needs one slot. One or more is needed for table synchronization. Set the value to 20.
  • wal_keep_size - Specifies the minimum size of past WAL files kept in the pg_wal directory, in case a standby server needs to fetch them for streaming replication. If a standby server connected to the sending server falls behind by more than wal_keep_size megabytes, the sending server might remove a WAL segment still needed by the standby, in which case the replication connection will be terminated. Recommended size is 2048 MB. This value allows a few hours or more of keeping replication data when no connection with standbys. Longer outages need manual data restore (Perform manual data restore).
  • max_slot_wal_keep_size - Specify the maximum size of WAL files that replication slots are allowed to retain in the pg_wal directory at checkpoint time. Recommended size is 1024 MB. This value allows a few hours or more of keeping replication data when no connection with standbys. Longer outages need manual data restore (Perform manual data restore). If the value is not set the database will use the default value -1, which means replication slots may retain an unlimited amount of WAL files.
  • max_logical_replication_workers - Specifies maximum number of logical replication workers. This includes leader apply workers, parallel apply workers, and table synchronization workers. Logical replication workers are taken from the pool defined by max_worker_processes. Set to the same number as max_worker_processes.
  • max_sync_workers_per_subscription - Maximum number of synchronization workers per subscription. This parameter controls the amount of parallelism of the initial data copy during the subscription initialization or when new tables are added. Currently, there can be only one synchronization worker per table. The synchronization workers are taken from the pool defined by max_logical_replication_workers. Set to half of max_logical_replication_workers.
  • max_parallel_apply_workers_per_subscription - Maximum number of parallel apply workers per subscription. This parameter controls the amount of parallelism for streaming of in-progress transactions. The parallel apply workers are taken from the pool defined by max_logical_replication_workers. Set to half of max_logical_replication_workers.




REPORTING AND LOGGING

  • log_min_duration_statement - Causes the duration of each completed statement to be logged if the statement ran for at least the specified amount of time. Enabling this parameter can be helpful in tracking down unoptimized SQL queries. Recommended value is 10000 (10 seconds).




Putting all together based on the size of the server and type of installations

Note that Axway has validated successful deployments with up to 4 SecureTransport Edges in synchronization (cluster).


Parameter Core servers Edge Servers
Minimum hardware 4CPU/16GB RAM From 8CPU/24GB RAM to 24CPU/64GB RAM Minimum hardware 2CPU/8GB RAM From 4CPU/16GB RAM to 24CPU/32GB RAM
Standalone 2-node cluster 3-node cluster Standalone 2-node cluster 3-node cluster Standalone 2-node cluster 3-node cluster Standalone 2-node cluster 3-node cluster
max_connections 1000 2000 2500 1000 2000 2500 1000 1500 2000 1000 1500 2000
max_files_per_process 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000
# RESOURCE USAGE (except WAL)
# - Memory -
shared_buffers 1GB 1GB 1GB 1GB - 6GB 1GB - 6GB 1GB - 6GB 512MB 512MB 512MB 1GB - 4GB 1GB - 4GB 1GB - 4GB
huge_pages try/off try/off try/off try/off try/off try/off off off off try/off try/off try/off
temp_buffers 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB
work_mem 4MB/8MB 4MB/8MB 4MB/8MB 8MB 8MB 8MB 2MB 2MB 2MB 4MB/8MB 4MB/8MB 4MB/8MB
maintenance_work_mem 384MB 384MB 384MB 512MB - 2047MB 512MB - 2047MB 512MB - 2047MB 320MB 320MB 320MB 384MB - 1536MB 384MB - 1536MB 384MB - 1536MB
# - Asynchronous Behavior -
effective_io_concurrency (Linux/Windows) 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0
maintenance_io_concurrency (Linux/Windows) 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0
max_worker_processes 4 4 4 num of CPUs
(8 - 16)
num of CPUs
(8 - 16)
num of CPUs
(8 - 16)
4 4 4 num of CPUs
(4 - 16)
num of CPUs
(4 - 16)
num of CPUs
(4 - 16)
max_parallel_workers_per_gather 2 2 2 4 4 4 2 2 2 2/4 2/4 2/4
max_parallel_maintenance_workers 2 2 2 4 4 4 2 2 2 2/4 2/4 2/4
max_parallel_workers 4 4 4 num of CPUs
(8 - 16)
num of CPUs
(8 - 16)
num of CPUs
(8 - 16)
4 4 4 num of CPUs
(4 - 16)
num of CPUs
(4 - 16)
num of CPUs
(4 - 16)
# WRITE-AHEAD LOG
wal_level replica logical logical replica logical logical replica logical logical replica logical logical
wal_buffers 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB
max_wal_size 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB
min_wal_size 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB
# QUERY TUNING
random_page_cost 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1
effective_cache_size 4GB 4GB 4GB 4GB - 12GB 4GB - 12GB 4GB - 12GB 3GB 3GB 3GB 4GB - 8GB 4GB - 8GB 4GB - 8GB
# REPLICATION
max_wal_senders - 20 20 - 20 20 - 20 20 - 20 20
max_replication_slots - 20 20 - 20 20 - 20 20 - 20 20
wal_keep_size - 2048 MB 2048 MB - 2048 MB 2048 MB - 2048 MB 2048 MB - 2048 MB 2048 MB
max_slot_wal_keep_size - 1024 MB 1024 MB - 1024 MB 1024 MB - 1024 MB 1024 MB - 1024 MB 1024 MB
max_logical_replication_workers - 4 4 - 8 8 - 4 4 - 4/8 4/8
max_sync_workers_per_subscription - 2 2 - 4 4 - 2 2 - 2/4 2/4
max_parallel_apply_workers_per_subscription - 2 2 - 4 4 - 2 2 - 2/4 2/4
# REPORTING AND LOGGING
log_min_duration_statement 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000


To reload the PostgreSQL database config at runtime login to database and execute: SELECT pg_reload_conf();.




3. Cluster Tuning


SecureTransport cluster basics:


  • Under the hood, the ST Standard Cluster with PostgreSQL is using Oracle Coherence for clustering and forms a "Coherence cluster", just like the ST Enterprise Cluster. It can have two or three nodes (servers).
  • Each server in a Standard Cluster has a local embedded PostgreSQL database and all nodes are working with the primary database. The secondary nodes are replicating all data except the server logs from the primary database.
  • The Edge cluster is simpler, and some deployments do not cluster Edge servers at all. Only the Admin service from each Edge node is a member of the Coherence cluster. The off-heap cache files are not used. In an Edge cluster only small configuration data and the administrator accounts are replicated across the nodes.
  • The Core cluster has two members from each server in the Coherence cluster - the TM and the Admin services. The TM must be the owner of off-heap cache files (ST handles this internally).
  • The principle for forming and monitoring the cluster is the same for Edge servers and Core servers.


Forming a cluster

The Coherence is using the database to allow members to join the cluster and for monitoring cluster nodes. The clusternode table contains one row for each server. Two columns must match the information from a server:


  • The column configurationid must be equal to LocalConfigurationsId from configuration.xml.
  • The column descriptor must be equal to IP address of the server.


The default installation is using multicast discovery mechanism over UDP. In a complex network or for servers with multiple network interfaces, you need to switch to unicast discovery over TCP following the procedure described in KB 178019. The servers must be able to ping each other over ICMP protocol and make sure TCP and UDP ports 8088 to 8093, 7574, and 7 are open on the firewall between the nodes (if any). For standalone installations the clusternode table could be left empty. Changing the IP address of a node requires the use of the options-overwrite.conf file to let ST update the clusternode table upon Admin start, which will allow the server to join the cluster with its new IP address.




Monitoring the cluster

Every node sends status messages according to the server configuration parameter Cluster.Status.heartbeatInterval (every 5 seconds) and updates the timestamp in column lastheartbeat in the clusternode table. If this does not happen within the period defined in the server configuration parameter Cluster.Status.heartbeatTimeout, the node is considered unresponsive and is removed from the cluster. The node itself is automatically restarted by Coherence to bring it back in sync. The default value of 15 seconds for the heartbeat timeout makes the cluster very sensitive to any kind of temporary issue and could lead to unnecessary TM restarts by Coherence. The recommended value is 60 seconds and it can be increased further if needed, up to 120 seconds.


Cluster - Status.heartbeatTimeout

How long after the last heartbeat a node is considered unresponsive and is removed from the cluster (in seconds). Requires Admin and TM restart if changed.


Cluster.Status.heartbeatTimeout=60


Cluster - nodeListRefreshTime

How often (in seconds) should the cluster check for new/removed nodes. Requires restart for the new value to take effect.


Cluster.nodeListRefreshTime=10




4. Transaction Manager Tuning


New installations of ST have increased (default) values for some server configuration parameters like max threads, and new features are usually enabled by default. For upgraded environments existing values are preserved and new features are usually turned off to preserve existing behavior.


Disk I/O

For a each file transfer ST creates a buffer with size specified in TransactionManager.fileOIBufferSizeInKB and flushes the buffer to the disk based on the setting of the parameter TransactionManager.syncFileToDiskEveryKB. While ST can process many transfers in parallel, it could be limited by the parameter TransactionManager.concurrentFileIOMax.


TransactionManager.fileOIBufferSizeInKB sets the size of the buffer for file transfer. Small buffer size will cause more I/O operations and eventually decrease performance especially when the storage works in sync mode. The optimal buffer provides the best performance for the environment with reasonable memory usage. Increasing the buffer too much does not provide further improvements in performance, but uses more physical memory - RAM. The right value depends on the underlying hardware and especially on the shared storage and mount options. The default buffer size is 128 KB which is a good value for modern virtual environments. A buffer size greater than 1 MB usually does not bring further improvements.


To tune the buffer size prepare a test with small files, with large files and with a mix of both. Start with the default value.


TransactionManager.fileIOBufferSizeInKB=128


Increase the buffer by doubling it.


TransactionManager.fileIOBufferSizeInKB=256


If no significant improvement in transfer times is observed, then return the previous value and stop testing. If there is an improvement, then continue increasing the buffer size in a few iterations until finding the optimal value for your environment.


TransactionManager.syncFileToDiskEveryKB specifies when to flush the buffer content to the disk. The default value 0 (zero) means to flush data when the buffer is full. The configuration TransactionManager.syncFileToDiskEveryKB = TransactionManager.fileOIBufferSizeInKB has the same behavior as zero value. Values lower than the buffer size might be helpful for some specific storages in non-virtual environments. For modern virtual environments use a zero value.


TransactionManager.syncFileToDiskEveryKB=0


TransactionManager.concurrentFileIOMax specifies the maximum number of files which can be processed in parallel. This is applicable not only for SIT and CIT transfers, but also for STFS attributes. It is similar to the max_open_files on Linux and is useful when the storage has a limitation applicable for all ST cluster nodes.


A low value of TransactionManager.concurrentFileIOMax could cause a bottleneck and huge performance degradation. In normal circumstances leave the value empty, which means no limit.


TransactionManager.concurrentFileIOMax= (leave empty)




Thread Pools

SecureTransport has 5 thread pools for handling user and server actions. Each pool has a parameter with a suffix minThreads, which sets the number of threads kept available for this pool at any time after the pool was created by the TM. Creating and destroying threads consumes CPU. Increasing this value for environments with high load saves some resources. Each pool has a parameter with suffix maxThreads, which sets the capacity of the pool. TM cannot create more than the specified number of maxThreads and further requests go into the relevant queue. The current default value for all thread pools is 768. The thread pools handling subscriptions SIT transfers have additional parameter with suffix maxThreadsPerGroup. Each subscription defines 3 groups - for incoming transfers, for outgoing Basic Application transfers, and for outgoing Advanced Routing transfers. TM will not assign more than the specified number of threads from thread pool to a group. This is a protection mechanism to avoid exhausting the assigned TM threads for transfers to problematic subscription when the environment is not too busy with other transfers.


Parameter Normal Load High Load
# General pool for processing transfers
EventQueue.ThreadPools.ThreadPool.minThreads 32 64
EventQueue.ThreadPools.ThreadPool.maxThreads 768 1024
EventQueue.ThreadPools.ThreadPool.maxThreadsPerGroup 64 128
# Thread pool for AR routes execution
EventQueue.ThreadPools.AdvancedRouting.minThreads 32 64
EventQueue.ThreadPools.AdvancedRouting.maxThreads 768 1024
EventQueue.ThreadPools.AdvancedRouting.maxThreadsPerGroup 64 128
# Thread pool for processing PeSIT transfers
EventQueue.ThreadPools.PESIT.minThreads 32 64
EventQueue.ThreadPools.PESIT.maxThreads 768 1024
EventQueue.ThreadPools.PESIT.maxThreadsPerGroup 64 128
# Thread pool for handling Streamed InProcess Events for SIT outbound transfers excluding AR transfers
TransactionManager.ThreadPools.ThreadPool.ServerTransfer.minThreads 32 64
TransactionManager.ThreadPools.ThreadPool.ServerTransfer.maxThreads 768 1024
# Thread pool for handling concurrent users (CIT) over streaming channels
TransactionManager.ThreadPools.ThreadPool.EventMonitor.minThreads 32 64
TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxThreads 768 1024


Thread Pools - Rule Engines

RuleEngines are used to evaluate the agent chain for a given event. This setting defines the number of rule engines ST can use concurrently.


TransactionManager.RuleEngine.pool=64




Event Queue

EventQueue - SIT persisted events in the database

Server-initiated Transfers (SIT) in ST are coordinated by events in the Event table in the database (persisted events). Client Initiated Transfers (CITs) do not insert any events in the event queue in the database except start and end events for a transfer, but they are cleared very fast and cannot be observed during the normal processing.


Any CIT becomes a SIT when the file arrives in the subscription folder. An event is inserted in the database with status 0 - Ready. When a thread is assigned to the event for processing the status changes to 1 - Active. An event processor distributes to the SecureTransport Servers in the cluster the events (representing workload tasks) based on the server configuration parameter EventQueue.DispatchPolicy.name. The default policy cacheBasedPolicy directs all events associated with an account to the same server to improve performance. More information about event distribution can be found at: Administrator Guide -> Direct cluster workload.


During the event processing ST may generate new persisted events for the same transfer. Once the event is processed it is removed from the database. At the end upon successful or failed transfer all events related to it are removed from the database. When ST does not have anything to process the event queue in the database is empty. In some abnormal situations ST may leave events which cannot be processed (stuck events) and they can block the subscriptions' scheduled pull execution. Abnormal situations can be the result of TM crashes, out of memory errors, lack of OS resources like file handles, unhandled sequences in ST, null pointer exceptions, specific wildcard pull errors, etc. A stuck event is an active event, but without a corresponding thread which is processing it. ST has an automatic monitoring mechanism for the event queue with an option to delete leftover events. The event queue is very dynamic, and it can be tracked from Admin UI -> Operations -> Event Queue or via the REST API 2.0 resource /events, or with SQL queries described in KB 183116.


EventQueue - CIT persisted events queue size in TM memory

SIT persisted events are not limited by size. CIT persisted events (start and end events) and active SIT events are placed in the EventQueue in TM's memory. This queue has a limit, enabled by default controlled by the server configuration parameter EventQueue.SizeLimit.enable. The default size is 5120.


EventQueue.SizeLimit.enable=true
EventQueue.SizeLimit.maxQueueSize=10240


Further increase of the EventQueue max queue size could cause a deeper issue in abnormal circumstances. Make sure that all other possible optimizations and tuning are in place before increasing the max queue size.



EventQueue - SIT persisted events monitoring

EventQueue.Heartbeat.Interval specifies the events' heartbeat update frequency in seconds. Each thread will attempt to update its heartbeat timestamp in the Event table every X seconds. The default value of 5 seconds is very aggressive for busy environments.

EventQueue.Heartbeat.Timeout specifies the number of seconds, above which SecureTransport will consider a particular event as staying in the queue abnormally long. If an event has not updated its heartbeat timestamp for more than X seconds, the event will be considered stuck and will be reported in the server log with Possible stuck events with expired heartbeat timeout detected. If the event recovery is enabled the event recovery process will be triggered. The default value of 60 seconds is very aggressive for busy environments.

EventQueue.Heartbeat.Recovery.Enabled turns on or off the recovery process which will delete stuck events. The default value is false.


Turn it on if analysis shows that there really are stuck events which can be safely deleted.


EventQueue.Heartbeat.Interval=15
EventQueue.Heartbeat.Timeout=600
EventQueue.Heartbeat.Recovery.Enabled=false
OR (see note above)
EventQueue.Heartbeat.Recovery.Enabled=true



EventQueue - CIT non-persisted events in TM memory

Client-initiated transfers (CIT) in ST are coordinated by EventMonitorService which receives events and callable requests sent from the protocol daemons over streaming channels. These requests arrive in the Event Monitor Queue. Once the Event Monitor finds an available request executor it assigns the request to it and deletes the request from the queue. The Event Monitor Queue has a usage reporting mechanism described below.


TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxQueueSize - Specifies the maximum size of the client-initiated events in the Event Monitor queue waiting for an executor. The default value is 1024.

TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxQueueSize.usageAlertsLogging - Controls the logging of warning messages for changes in the queue size. When enabled, warnings are logged each time the queue size increases or decreases by 10%, given that the queue is over 50% full. The default value is disabled.


TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxQueueSize=10240
TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxQueueSize.usageAlertsLogging=enabled


Further increase of the Event Monitor max queue size could cause a deeper issue in abnormal circumstances. Make sure that all other possible optimizations and tuning are in place before increasing max queue size.




Maximum simultaneous connections to a remote host

Maximum number of concurrent sessions established to any given partner for Server-Initiated Transfers (SITs), that are not triggered by an Advanced Route. The default value is 100.


OutboundConnections.maxConnectionsPerHost=1000





Protocol commands batch size

Maximum size of the protocol commands accumulated in memory before they are persisted into database. The default value is 100. Check the server logs for messages Value of 'Server.ProtocolCommands.batchSize' is too low and may lead to performance degradation.. If you see such a message double the value and monitor for the same messages. If the message is still present increase the size again and so on until the messages disapear.


Server.ProtocolCommands.batchSize=100
OR
Server.ProtocolCommands.batchSize=200
OR
Server.ProtocolCommands.batchSize=400
or etc.





Skip creating AR sandbox if no transformation steps

If a Route includes only Publish To Account and/or Send To Partner steps and does not transform files, SecureTransport has an option to skip the creation of the sandbox folder and the copying of the file from the subscription folder to the sandbox folder. Instead, it will directly transmit the original file from the subscription folder to speed up the route execution. The default value is true (skip sandbox if possible).


Even without transformations, a sandbox folder is created when any of the following is configured in a Send To Partner step: 1) Post Routing Action -> Delete files after step is complete; 2) Send Trigger File is enabled; 3) Configure Advanced PeSIT Settings is enabled.


AdvancedRouting.DontCopyPayload=true





AR redirect sandbox to local disk

The route execution consists of creating a sandbox, copying the file from the subscription folder to the sandbox, executing the route steps, purging the sandbox, and executing the post routing actions. The sandbox is a subfolder structure under the objects folder created in the account's .stfs metadata folder:


{shared storage}/{account home folder}.stfs/objects/{sone identifier 1}/{some identifier 2}/{some identifier 3}/{some identifier 4}


The entire route execution process consists of lots of IO operations on the storage and copy operation for large files take time. The sandbox redirection aims to reduce the IO operations on a shared storage (move them to local disk) and speed up copy operation (the copy is faster from remote shared strage to local disk). The redirection is done by converting the objects folder to a symbolic link pointing to a path on a local disk.


On Windows environment following symbolic links from remote to local disk are usually disabled. To enable following the link in all directions use the command fsutil behavior set SymlinkEvaluation L2L:1 R2R:1 L2R:1 R2L:1. More information is at link fsutil behavior.


AdvancedRouting.sandboxFolderLocation={absolute path to local disk location}





Proxy blacklisting

The Transaction Manager has a blacklisting mechanism for the SOCKS proxy. More information can be found in KB 181585.


Some errors from partner servers can trigger the blacklisting mechanisms and temporary block all Edge servers in the zone. In such a case turn off blacklisting if TM is not allowed to connect to the partner server directly or turn on direct connection if possible.


Proxy.Blacklisting.Enabled=false

OR

Direct.Connection.When.Proxy.Down=true





SSH connection reuse

For SSH transfers sites SecureTransport has a connection pooling mechanism introduced in SecureTransport version 5.5-20230126 for server-initiated transfers, which allows SecureTransport to reuse recently established connections to the remote SSH server. A separate connection pool is created for each SSH transfer site. If set, the Maximum parallel transfers property defines the maximum number of connections that can be opened by the connection pool per node. The SSH connection pool is off by default.


Ssh.SIT.ConnectionPool.Enabled=true
Ssh.SIT.ConnectionPool.MinEvictableIdleDuration=30
Ssh.SIT.ConnectionPool.TimeBetweenEvictionRuns=15





Partitions

On PostgreSQL, the Log Entry and Transfer Log maintenance applications do not create partitions. Instead, a dedicated create partition service is executed on startup of the Transaction Manager and protocol daemons. The Partiotion.DaysToPrebuild server configuration option can be used to specify the number of days that partitions will be created in advance. If you leave it empty (default), partitions will be created for 3 days ahead. The service will not create new partitions if they have already been created for the specified interval.


By default, the daily partitions are created every day at 00:00. To change the partition creation time, update the value of the PartitionManagement.Create.triggerTime server configuration option. The format is HH:MM, with hours in the range 0–23.


The new installation of SecureTransport set all maintenance applications to start at midnight (00:00). See below Maintenance Applications Tuning. Statistics summary for usage reporting is also triggered at midnight and there is currently no option to change the trigger time. This may affect the process of creating partitions for some tables because the tables might be locked. If there are frequent failures during execution of the partition creating service change the trigger time. Check the following article for a known issue with file tracking replication in SecureTransport versions prior to 5.5-20250731: KB 191320.


Partition.DaysToPrebuild=7
PartitionManagement.Create.triggerTime=00:00
OR
PartitionManagement.Create.triggerTime=01:10




PeSIT enhancements

PeSIT IDs - SecureTransport represents an entity of a PeSIT partner by the combination of an Account and a Transfer Site. A new functionality introduced in SecureTransport version 5.5-20211216 allows to eliminate the need for both parties to use unique names in their configurations. When enabled via the server configuration parameter Pesit.UsePesitIds the PeSIT partnership is formed based on the PeSIT ID properties specified in the account and the Transfer Site settings. PeSIT ID is not a mandatory field, and if left empty, SecureTransport defaults to using the name property. The default value is false.

CFT Extensions - As of SecureTransport version 5.5-20230330, SecureTransport complies with PeSIT CFT extensions and can handle the PI 99 usage. A server configuration option Pesit.CftExtensions.Enabled enables the usage of PeSIT extensions. The default value is true.

Connection Pool Max Wait Time - For server-initiated PeSIT outbound transfers via Transfer Sites with limitation for simultaneous transfers SecureTransport may experience issues of finding available connections from the connection pool. The connection pool is created for destination server host and port once it is used and destroyed if no new files are available to push. You may have multiple transfer sites to the same destination server, but with different limits. If two Transfer Sites with different limits start transfering simultaneously, the first one will create the connection pool and all transfers will use the limit from this Transfer Site. The server configuration parameter ConnectionPool.maxWaitTime sets the timeout for how long ST may attempt to get the connection from the pool before giving up. The default value is 86400 seconds (24 hours). It is recommended to reduce the timeout to a feasible value like 2 minutes.


Pesit.UsePesitIds=true
Pesit.CftExtensions.Enabled=true
ConnectionPool.maxWaitTime=120




Optimize the account initialization process

For client-initiated transfers during the user login process SecureTransport checks all configured subscription folders for their presence and permissions. In the occasion of large number of subscription folders and frequent logins this introduces a significant slowdown for the login and if the filesystem is overwhelmed, the slowdown is even worse. As of SecureTransport version 5.5-20220331, SecureTransport provides a monitoring service for ST accounts (non-template accounts). After a successful account initialization on first login, the account is registered to be monitored (along with the subscription folders) to the service. A info message appears in server logs Account with ID 'XXXXXX' is successfully registered for monitoring by the Directory Structure service. Subsequent logins will ignore the account init if there are no changes in the subscription folders. When a subscription folder is deleted (regardless of the source of the deletion), the account will be removed from the service, and next login will initialize the account and will create/check all subscription folders. The default value is true.


DirectoryStructureServiceEnabled=true




Retry cycles for Basic Application, Advanced Routing - Pull From Partner step, and REST API calls

When a server-initiated transfer fails, SecureTransport can automatically retry the transfer. The time in seconds that SecureTransport waits after a transfer fails before retrying it is calculated by the formula Retry number * EventQueue.retryDelayInterval. After the retry count reaches the value of EventQueue.maxRetryCount the retries cycle stops and ST fails the transfer permanently. With default values of 5 retries and 120 seconds delay interval the retries cycle takes 30 minutes to complete. When the maximum simultaneous connections to a remote host is reached a transfer is retied internally with no limit of retries and the wait time is calculated by the formula Retry number * EventQueue.internalRetryDelayInterval.


The default values


EventQueue.internalRetryDelayInterval=120
EventQueue.maxRetryCount=5
EventQueue.retryDelayInterval=120


Values to match the default Advanced Routing retries and for intensive REST API usage.


EventQueue.internalRetryDelayInterval=2
EventQueue.maxRetryCount=5
EventQueue.retryDelayInterval=2




STFS attribute files and caching

During transfer processing, SecureTransport uses the so-called "stfs" files to store metadata attributes for each transferred file. These are serialized files located in a hidden directory under the user’s home: ~/.stfs/attrs/.


For every uploaded file, a corresponding metadata file is created: ~/.stfs/attrs/>filename<. When a file is moved or deleted, SecureTransport also updates or removes the associated metadata file. The stfs files include both transfer-specific and context-related data, such as Repository Encryption information, decrypted file size (for Repository Encrypted files), transfer status and TransferStatusId, CoreId, startTime, and more.


In some scenarios, it stores additional attributes that can be related to the file, such as the Flow Attributes and PeSIT context attributes. This metadata is read and written multiple times during processing and is critical for SecureTransport's core functionality. These operations may impact performance, especially when there is latency to the shared file system.


A caching mechanism for these STFS attributes is added in SecureTransport version 5.5-20250424, greatly improving the performance for scenarios with large numbers of small files, reducing the filesystem load. This feature is enabled by default meaning that SecureTransport reads the attributes from memory instead of the filesystem.


Stfs.attributes.coherence.cache.enabled=true




Repository encryption

Repository encryption increases SecureTransport's security by avoiding storing unencrypted files to shared storage. When repository encryption is enabled SecureTransport encrypts each file that it pulls from a partner site or that a client pushes to ST. When SecureTransport pushes a file to a partner site or a client pulls a file from ST, SecureTransport decrypts the file. SecureTransport encrypts and decrypts each file dynamically in memory as it receives and sends it, so the files never exist unencrypted in the storage of the host system.


If repository encryption is enabled it is recommended to set server configuration parameter TM.preferBouncyCastleProvider to false. For details refer to the BouncyCastle Security Provider section below.


When Repository Encryption is disabled (after it was enabled before), all previously encrypted files will not be decrypted and transfer will fail. To ensure they can be decrypted and processed, set server configuration parameter DecryptOnDisabledEncryption to true.


If you enable repository encryption, the following SecureTransport functions are not supported: resume PeSIT transfers and pause and resume transfers when SecureTransport is the server.


Stfs.Encryption.CertAlias - Setting this value will enable the repository encryption. Use any certificate alias from the Local Certificates store. Leaving it empty disables repository encryption.

Stfs.Encryption.ListDecryptedSize - Determines which file size, the original file size or the encrypted one, to be reported for repository encrypted files when performing directory or file listing. When set to false, the encrypted file size is reported. When set to true, the original, unencrypted file size (taken from the STFS metadata) is reported. In this case performance degradation is observed when listing directories with lots of files. If it is not required to read the actual file size keep the value of this parameter to false.

Stfs.Encryption.ReadBufferSize - Specifies the buffer size for read operations when Repository Encryption is enabled. Affects download speed when Repository Encryption is enabled. The default value is 32768. Larger buffer may not bring improvement but uses more physical memory - RAM. The optimal value depends on the underlying hardware and other configurations. The value must be less than or equal to TransactionManager.fileIOBufferSizeInKB. Use default value of 32K as minimum starting point for optimization. Refer to the Disk I/O section for information how to optimize this buffer.

Stfs.Encryption.WriteBufferSize - Specifies the buffer size for write operations when Repository Encryption is enabled. Affects upload speed when Repository Encryption is enabled. The default value is 32768. Larger buffer may not bring improvement but uses more physical memory - RAM. The optimal value depends on the underlying hardware and other configurations. The value must be less than or equal to TransactionManager.fileIOBufferSizeInKB. Use default default value of 32K as minimum starting point for optimization. Refer to the Disk I/O section for information how to optimize this buffer.

Stfs.Hash.HashOnUpload - Controls the on-the-fly (dynamically as the file is uploaded) hashing (computing the MD5 checksum) of all incoming transfers. When the value is false, SecureTransport computes the MD5 checksum after the file transfer has completed. This is applicable on both scenarios when repository encryption is enabled and when repository encryption is disabled. To minimize the delay on finalizing upload of large files set the value to true.


Stfs.Encryption.CertAlias= (leave empty)
Stfs.Encryption.ListDecryptedSize=false
Stfs.Encryption.ReadBufferSize=32768
Stfs.Encryption.WriteBufferSize=32768
DecryptOnDisabledEncryption=true
Stfs.Hash.HashOnUpload=true




File Archiving

The File Archiving feature enables the archiving and retrieval of files for resubmit purpose at the global, business unit, and account levels. As of SecureTransport version 5.5-20241031 the Rotate archive folder option allow storing archived files in timestamped subfolders which will be automatically created daily or hourly, depending on the retention period. Consider enabling this option to reduce the archive maintenance time if you expect a high volume of files. See below Maintenance Applications Tuning.


The file archive folder and user home folders should reside on a separate storage devices. There is a negative performance impact when the archive folder is on the same storage device as user home folders due to writing data twice on the same storage device.




Audit Log

The Audit Log contains entries that SecureTransport records when any change is made to the SecureTransport configuration. Audit logging is enabled by default except for changes made by the Transaction Manager. Audit log records can capture details for the object after modification in a text form called collection. Depending on the object type and size this collection could be huge (a few megabytes) and its generation consumes time and resources. The collection can be compared with the collections from the previous records for the same object to identify the exact changes. If this information is not necessary, it is better to turn off collections by setting the value to false for the server configuration parameter AuditLog.Enabled.CollectionLog.


Accounts import is much slower when audit logging is enabled and there is a separate server configuration parameter AuditLog.Enabled.Import which can disable or enable it. The server log contains some information for accounts import actions, so it is better to turn off audit logging during account import.


Audit Log Maintenance

The Audit Log Maintenance application deletes and exports log entries in chunks with a default size of 1000 entries. If collections are enabled, to avoid out of memory errors, decrease the chunk size using the server configuration parameter AuditLog.ChunkSize introduced in SecureTransport version 5.5-20231130. The parameter accepts values between 1 and 1000. Values outside of this range are considered invalid and result in Server Log warnings upon application execution. If the configuration option has an invalid value, the default will be used. See below Maintenance Applications Tuning.


AuditLog.ChunkSize=10
AuditLog.Enabled.Admin=true
AuditLog.Enabled.TM=false
AuditLog.Enabled.Import=false
AuditLog.Enabled.CollectionLog=false
OR
AuditLog.Enabled.CollectionLog=true




Folder Monitor

Folder Monitor is a TM service scanning a designated folder (and optionally its subfolders) for specific files based on the Transfer Site configuration. Once the Transfer Site is used in a Subscription, SecureTransport starts monitoring the folder at fixed intervals as defined in the server configuration parameter FolderMonitor.pollInterval. The default value is 5 seconds. Depending on the number of monitored locations and the performance of the shared storage FM may not be able to scan all configured folders within the poll interval. Use small steps to increase the poll interval (10, 15, 20, etc.) until an optimal value is found. The Folder Monitor service runs on one server of the cluster. If that server fails, the Folder Monitor service automatically fails over to the server in the cluster with the Transaction Manager that has been running the longest. The Folder Monitor service updates its heartbeat according to the server configuration parameter FolderMonitor.heartbeatInterval and fails over upon expiring FolderMonitor.heartbeatTimeout. The FM picks up only files that have not been modified for FolderMonitor.fileDelayInterval. The process is: 1. renaming the source file by adding suffix FolderMonitor.filePostfix; 2. copying the file to the subscription folder; and 3. deleting the source file. Then the target subscription is triggered. Use the defaults shown below and change the mentioned parameters only if necessary.


FolderMonitor.enable=true
FolderMonitor.fileDelayInterval=5
FolderMonitor.filePostfix=__@PROCESS@
FolderMonitor.heartbeatInterval=5
FolderMonitor.heartbeatTimeout=60
FolderMonitor.maxCachedSites=10000
FolderMonitor.pollInterval=5




Scheduler

The Scheduler is a TM service firing subscription pulls from a Transfer Site according to a Subscription's schedule. It is similar to the Linux crontab or the Windows Task Manager. The Scheduler manages all scheduled tasks centrally from the oldest member of the cluster. When schedules trigger events for scheduled tasks, one consolidated queue for all events is maintained across the cluster. This queue is shared and replicated across all the servers in the cluster so that they share the load by taking events from the queue one item at a time and performing the actual transfers or other tasks. If the node where the Scheduler is running fails, another node will take over. You can schedule jobs in two ways, either per subscription or per application.


SecureTransport is using the Quartz Scheduler third party library with separate configuration file located at: $FILEDRIVEHOME/conf/scheduler.properties. The default configuration is using 1 thread with 5 connections to the DB and low priority. For environments with lots of scheduled tasks set the below recommended values in the config file. The last parameter org.quartz.jobStore.acquireTriggersWithinLock=true aims to prevent triggering the same job on two cluster nodes.


The Scheduler cannot be used for AS2 Transfer Sites. Before queueing a new task, the server checks if a previous instance of same periodic task is still pending. If there is an instance of the same periodic scheduled task, the new task is not scheduled, and warning message appear in server log The task "<<SubscriptionID>>_subscription_PARTNER-IN" of account with name: "<<ST account>>" with subscription folder: "<<Subscription folder>>" is still in progress. Skipping the next scheduled occurrence of this task.


Scheduler.enable
Cluster.Service.Scheduler.File.configurationFile.path=/conf/scheduler.properties
#
$FILEDRIVEHOME/conf/scheduler.properties (file)
org.quartz.threadPool.threadPriority=5
org.quartz.jobStore.misfireThreshold=120000
org.quartz.dataSource.DS.maxConnections=25
org.quartz.threadPool.threadCount=20
org.quartz.jobStore.acquireTriggersWithinLock=true




Status Checker - Load balancer health checks

A classic approach to load balancing is to continuously to monitor the service's ports for availability. This is not sufficient for ST because a streaming connection from the TM is an additional condition to consider a service healthy. More complex mechanisms for health-checking like login with a user account consume resources and might not be possible in some load balancers. ST 5.5 provides a liveness status check mechanism via a HTTP service executed by monitord. This service is not configured and enabled by default. Choose any available port on the operating system and set it in the server configuration parameter StatusChecker.port. In the example below the chosen port is 5555. Enable the service by setting the server configuration parameter StatusChecker.enabled to true. Restart the monitord service.


Liveness status can be requested individually for each protocol daemon with URL:


http://{Server IP address}:5555/healthCheck?daemon={daemon}


where {daemon} is one of the following: ADMIN, FTPD, HTTPD, AS2D, SSHD, PESITD, or SOCKS. The expected responses are:


  • 200 (OK) - Indicates that the service is healthy with functional streaming connections
  • 503 (Service Unavailable) - Indicates that the service is NOT healthy. The service is stopped or no streaming connection has been established.


StatusChecker.enabled=true
StatusChecker.heartbeatInterval=20
StatusChecker.port=5555 (just example)


The health check mechanism depends on the entries in the database table componentstatus. If some entries for a node do not match the component the response code is always 503. The value in the column configurationid must be equal to the LocalConfigurationsId from configuration.xml and the column host must be equal to the IP address of the node. If there is a discrepancy delete all problematic entries from the componentstatus table and ST will rebuild them upon restart of the relevant node.


Limitation: The health check only assesses the default listener!




File renaming for client-initiated uploads

SecureTransport features a file locking mechanism that blocks access to files while they are being uploaded or processed. This mechanism prevents partial uploads and conflicts with post-processing actions, but it may create issues with clients that upload files with temporary names and attempt to rename them after completing the transfer, as SecureTransport may still have the file locked. In such cases, rather than failing immediately when a rename command is issued for a locked file, SecureTransport can be configured to wait for a predetermined period for the lock to be released. This wait-and-retry mechanism can be customized using the following server configuration parameters:


RenameLockedFiles - Controls whether SecureTransport allows renaming attempts on locked files (associated with an .m_inproc file). By default, it is set to disabled, causing any rename command issued while the file is locked to fail immediately. When it is set to enabled, SecureTransport will periodically check if the file is still locked. If all retries are exhausted and the file is still locked, the renaming operation will ultimately fail. You can further adjust the check count and interval.

CIT.Upload.RenameAfterUnlocked.RetryCount - Specifies how many times to check if an uploaded file is unlocked before renaming it. This option only works if RenameLockedFiles is enabled. Default value is 10.

CIT.Upload.RenameAfterUnlocked.RetryDelayInterval - Specifies the delay interval, in milliseconds, between checking if an uploaded file is unlocked before renaming it. The total time (including all checks) must not exceed the maximum idle time interval configured in the client used for upload. This option only works if RenameLockedFiles is enabled. Default value is 1000.


RenameLockedFiles=enabled
CIT.Upload.RenameAfterUnlocked.RetryCount=10
CIT.Upload.RenameAfterUnlocked.RetryDelayInterval=1000




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter TM.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set value to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


TM.preferBouncyCastleProvider=true
OR
TM.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful shutdown is a feature that allows you to have a planned Transaction Manager stop without abrupt cancellation of current server-initiated transfers (SITs), post-routing, post-transformation, and post-processing actions, Advanced Routing actions (all routes and their respective route steps). Once the graceful shutdown is initiated, the TM waits for the existing tasks to finish and does not accept new tasks. The maximum time that TM waits before stopping is set in the server configuration parameter TransactionManager.GracefulShutdownTimeout. The default value is 86400 seconds (24 hours). In case there are leftover stuck events TM will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


TransactionManager.GracefulShutdownTimeout=300




Allow Expired Certificates

SecureTransport provides server configuration parameters to control (allow or disallow) the use of expired certificates.


SIT.allowExpiredCertificates - Controls the usage of expired X509 certificates for server-initiated transfers over the FTPS, HTTPS, PeSIT protocols.

SSH.SIT.allowExpiredCertificates - Controls the usage of SSH keys contained in expired X509 certificates for server-initiated transfers over SSH protocol.

SSH.CIT.allowExpiredCertificates - Controls the usage of SSH keys contained in expired X509 certificates for client-initiated transfers over SSH protocol.


SIT.allowExpiredCertificates=true
OR
SIT.allowExpiredCertificates=false
SSH.SIT.allowExpiredCertificates=true
SSH.CIT.allowExpiredCertificates=true




SSH change the permissions of a file

The server configuration parameter Ssh.UpdateFilePermissionsWithChmodCommand determines whether to use a chmod or a umask command to change the permissions of a file. This can be overridden on transfer site level. The file permissions are set after transfer ends with chmod when the value is set to true. The file handler is opened with specified permissions when the value is set to false. File permissions are modified with umask.


Ssh.UpdateFilePermissionsWithChmodCommand=true
OR
Ssh.UpdateFilePermissionsWithChmodCommand=false




DNS Lookups

SecureTransport provides server configuration parameters to control DNS resolution.


Dmz.Edge.proxyDnsResolutionCheck - Domain Name System resolution will be performed on the Edge before the server-initiated transfer takes place when the value is set to true. Requires active streaming between the Core server and the Edge server. This option will apply only if Use the Edge DNS configuration is enabled in the Network Zone configuration.

SIT.ReverseDNSLookups - Controls the DNS Reverse Lookup for server-initiated transfers. То prevent delays due to DNS lookups set the value to false.


Dmz.Edge.proxyDnsResolutionCheck=false
SIT.ReverseDNSLookups=false




SSL Logging

Specifies the log level of the TLS security information log messages for Pluggable Transfer Sites. These are the Generic-HTTP(S), S3, Azure Blob Storage, Azure File Storage, Google Cloud Storage, Google Drive, OneDrive, SharePoint, etc. The default value OFF suppresses printing a message with security information. The value INFO print message with security information on info level. The value DEBUG prints the same message with security information on debug level instead of on info level. The sample DEBUG message in Server Log looks like below:


User with login name "user1", associated with account "user1", had initiated a connection over HTTP-GENERIC. Remote address: www.google.com. Connection security parameters: cipher suite: TLS_AES_128_GCM_SHA256, TLS/SSL protocol: TLSv1.3.


Plugins.TransferSites.SSLLogging.Level=DEBUG





5. FTP Server Tuning


Buffers

DataBufferSize - FTP data connection buffer size. Allocated on every transfer.

ReadBufferSize - FTP read buffer size. Parameter is increased to avoid excessive streaming traffic due to fragmentation.

ReceiveBufferSize - FTP receive buffer size.


Ftp.DataBufferSize=131072
Ftp.ReadBufferSize=131072
Ftp.ReceiveBufferSize=131072




FTPS compliance

Ftp.Ssl.requireCloseNotify - If the FTP client does not send a close_notify message when uploading files to SecureTransport via FTPS, set to false to prevent failing the transfer. If set to false, the server would be susceptible to TLS truncation attacks! The recommended value is true unless absolutely necessary.

Ftp.Ssl.StrictRfc2228 - Controls strict RFC2228 compliance of the FTPD upon reply to the AUTH TLS command from the clients. Recommended value is true.

Ftp.Ssl.StrictRfc2228CertAuth - Controls strict RFC2228 compliance for certificate authentication of the FTPD. Recommended value is false.


Ftp.Ssl.requireCloseNotify=true
Ftp.Ssl.StrictRfc2228=true
Ftp.Ssl.StrictRfc2228CertAuth=false




DNS Lookups

Server.Dnslookups - This parameter controls whether Server DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is false.

Server.ReverseDNSLookups - This parameter controls whether Server reverse DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is off.


Server.Dnslookups=false
Server.ReverseDNSLookups=off




DataTimeout

The number of seconds the server waits to read a block of data from the client, or write a block of data to the client. If not specified, its value is infinity.


Ftp.DataTimeout= (leave empty)




ListenBacklog

Set the size of the sockets backlog.


Ftp.ListenBacklog=1024




LoginFailureDelay

Specifies the time in milliseconds for which the client is delayed to login after invalid login attempt. Increasing the value can slow down brute force attacks or rogue clients.


Ftp.LoginFailureDelay=500




MaxClients

Set maximum number of concurrent connections. 0 means unlimited.


Ftp.MaxClients=500




WorkerThreads.maxThreads

The maximum number of worker threads in the FTP daemon used for the processing of the requests.


Ftp.WorkerThreads.maxThreads=1024




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter Ftp.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


Ftp.preferBouncyCastleProvider=true
OR
Ftp.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, FTPD waits for the timeout period specified in the server configuration parameter Ftpd.GracefulShutdownTimeout before stopping the FTP service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions FTPD will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Graceful shutdown logging interval - The Server Log displays information about the active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with the graceful shutdown, you must stop the Monitor Server.


Ftp.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The FTP daemon can print SSL/TLS security parameters (TLS version and Cipher Suite) about newly successfully established connections when the server configuration parameter SSLLogging.Ftp is set to true. The sample INFO message in Server Log looks like below:


Establishing FTPS connection with host 127.0.0.1, using cipher suite: TLS_AES_256_GCM_SHA384 and TLS/SSL protocol: TLSv1.3.


SSLLogging.Ftp=true




6. HTTP Server Tuning


ThreadPool

ThreadPool MinThreads - HTTP server request thread pool minimum threads. The default value is 32.

ThreadPool MaxThreads - HTTP server request thread pool maximum threads. The default value is 256.

ThreadPool ThreadsIdleTimeMillis - How much time (in milliseconds) a thread from the thread pool should stay idle before it's stopped. The default value is 60000.


Http.ThreadPool.MinThreads=128
Http.ThreadPool.MaxThreads=1024
Http.ThreadPool.ThreadsIdleTimeMillis=60000




Connections

MaxSimultaneousTransfers - Maximum simultaneous transfers per client. The default value is 20.

Connection MaxIdleTime - The maximum Idle time (in milliseconds) for a connection. The default value is 5 minutes.

AcceptQueueSize - The number of connection requests that can be queued up before the operating system starts to send rejections. The default value is 10000.


Http.MaxSimultaneousTransfers=25
Http.Connection.MaxIdleTime=300000
Http.AcceptQueueSize=10000




Request monitor service

Request MinBandwidth - Sets the minimum processing bandwidth for incoming HTTP requests. If an incoming request drops below the specified minimum bandwidth more than a specified number of times (see Http.Monitor.IterationCount), the connection is reset. Possible values: <number of bytes per second> | 0. Default value is 0. If the value is set to 0 - the request monitor service is disabled.

Monitor IterationCount - Sets the maximum number an HTTP request can drop below the specified minimum bandwidth (See Http.Request.MinBandwidth). If a request drops below that threshold, the connection is reset. Default value: 10. Cannot be set to 0. Option is ignored if HTTP request monitor service is disabled.


Http.Request.MinBandwidth=0
Http.Monitor.IterationCount=10




DNS Lookups

Server.Dnslookups - This parameter controls whether Server DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is false.

Server.ReverseDNSLookups - This parameter controls whether Server reverse DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is off.


Server.Dnslookups=false
Server.ReverseDNSLookups=off




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter Http.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.

In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


Http.preferBouncyCastleProvider=true
OR
Http.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, HTTPD waits for the timeout period specified in the server configuration parameter Http.GracefulShutdownTimeout before stopping the HTTPD service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions HTTPD will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.

Graceful shutdown logging interval - The Server Log displays information about the active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


Http.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The HTTP daemon can print SSL/TLS security parameters (TLS version and Cipher Suite) about newly successfully established connections when a server configuration parameter SSLLogging.Http is set to true. The sample INFO message in Server Log looks like below:


Establishing HTTPS connection with host 127.0.0.1, using cipher suite: TLS_AES_256_GCM_SHA384 and TLS/SSL protocol: TLSv1.3.


SSLLogging.Http=true




7. SSH Server Tuning


Note that the SSH protocol for server-initiated transfers has additional tuning parameters in each SSH Transfer Site! Use larger buffers for higher transfer rates over high bandwidth high latency networks. The Sftp Message Block Size can be increased up to 262000 bytes if the server supports it.


max.pta.wait

Specifies how many milliseconds is the maximum wait time that the SSH server won't return response if the file is currently being processed.


Ssh.max.pta.wait=2000




maxChannels

Maximum channels per client. A single SSH connection may contain multiple channels, all run simultaneously over that connection.


Each channel, in turn, represents the processing of a single service. When you invoke a process on the remote host with Net::SSH, a channel is opened for that invocation, and all input and output relevant to that process is sent through that channel. The connection itself simply manages the packets of all of the channels that it has open.


Ssh.maxChannels=30




maxConnections

Maximum allowed connections to SSHD. Configurable in the SSH Settings page.


Ssh.maxConnections=100




DNS Lookups

Server.Dnslookups - This parameter controls whether Server DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is false.

Server.ReverseDNSLookups - This parameter controls whether Server reverse DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is off.


Server.Dnslookups=false
Server.ReverseDNSLookups=off




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter Ssh.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


Ssh.preferBouncyCastleProvider=true
OR
Ssh.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, SSHD waits for the timeout period specified in the server configuration parameter Ssh.GracefulShutdownTimeout before stopping the SSHD service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions SSHD will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Graceful shutdown logging interval - The Server Log displays information about active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


Ssh.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The SSH daemon can print negotiated security parameters (KEX, Ciphers and MACs) for newly successfully established connections when the server configuration parameter SSLLogging.Ssh is set to true. The sample INFO messages in Server Log looks like below:


Establishing SSH connection with host 0:0:0:0:0:0:0:1, using the following properties: key exchange: curve25519-sha256 client-server cipher: aes256-gcm@openssh.com, server-client cipher: aes256-gcm@openssh.com, server-client MAC: <implicit>, client-server MAC: <implicit>.


For some ciphers integrity is not provided using a MAC, but it is part of the cipher itself. In such case the negotiated MAC is shown as implicit.


Establishing SSH connection with host 127.0.0.1, using the following properties: key exchange: diffie-hellman-group-exchange-sha256 client-server cipher: aes128-ctr, server-client cipher: aes128-ctr, server-client MAC: hmac-sha2-256, client-server MAC: hmac-sha2-256.


SSLLogging.Ssh=true




8. AS2 Server Tuning


Receiver.maxContentLength

Maximum file sizes for receiving. The default maximum file size is 50 megabytes, 0 for unlimited. Configurable in the AS2 Settings page in the Admin UI.


As2.Receiver.maxContentLength=200




Sender.maxContentLength

Maximum file sizes for sending. The default maximum file size is 50 megabytes, 0 for unlimited. Configurable in the AS2 Settings page in the Admin UI.


As2.Sender.maxContentLength=200




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter As2.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


As2.preferBouncyCastleProvider=true
OR
As2.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, AS2D waits for the timeout period specified in the server configuration parameter As2.GracefulShutdownTimeout before stopping the AS2D service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions AS2D will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Graceful shutdown logging interval - The Server Log displays information about active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


As2.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The AS2 daemon can print SSL/TLS security parameters (TLS version and Cipher Suite) for newly successfully established connections when a server configuration parameter SSLLogging.As2 is set to true. The sample INFO message in Server Log looks like below:


Establishing AS2 SSL connection with host 127.0.0.1, using cipher suite: TLS_AES_256_GCM_SHA384 and TLS/SSL protocol: TLSv1.3.


SSLLogging.As2=true




9. PeSIT Server Tuning


Note that the PeSIT protocol for server-initiated transfers has additional tuning parameters in each PeSIT Transfer Site!


Pesit.ASCII.recordsInfo.bulk.size

When transferring files over PeSIT in ASCII mode, SecureTransport counts the number of characters on each line and stores them in memory. When the transfer is finished, this data is stored on the file system. This parameter limits the number of line counters stored in memory (each counter is 4 bytes) before the data gets flushed to file. Increasing this parameter can improve performance but will increase the memory usage by the TM and the PeSIT daemon. Allowed values are greater or equal to 1024. The default value is 32768.


Pesit.ASCII.recordsInfo.bulk.size=32768




Timeouts

Create and Select Timeout - PeSIT CREATE/SELECT timeout. Configurable in the PeSIT Settings page in the Admin UI. Default value is 300 seconds.

Inactivity Timeout - PeSIT Protocol inactivity timeout. Configurable in the PeSIT Settings page in the Admin UI. Default value is 60 seconds.

Connection Release Timeout - PeSIT Connection release timeout. Configurable in the PeSIT Settings page in the Admin UI. Default value is 60 seconds.


Pesit.CreateSelect.Timeout=300
Pesit.Inactivity.Timeout=60
Pesit.Connection.Release.Timeout=60




Pesit.MaxConnections

PeSIT maximum number of opened connections. The "Maximum Connections Number" parameter determines how many TCP connections can be initiated, regardless of the number of transfers. Configurable in the PeSIT Settings page in the Admin UI.


More information: KB 177257


Pesit.MaxConnections=200




Pesit.MaxSessions

PeSIT maximum number sessions. The "Maximum Sessions Number" parameter determines how many separate PeSIT transfers can be run simultaneously to you. Configurable in the PeSIT Settings page in the Admin UI.


More information: KB 177257


Pesit.MaxSessions=200




Pesit.Server.pTCP.Buffer.Size

PeSIT server pTCP buffer size in bytes - size of the the buffer collecting data from multiple pTCP connections into one. Does not require restart of PeSIT servers when changed. Takes effect for new transfers after a change.


Set extra large value for – larger than file size. 100 MB = 104857600 bytes.


Pesit.Server.pTCP.Buffer.Size=104857600




Pesit.Server.Socket.Buffer.Size

Socket send/receive buffer size in bytes for PeSIT servers. Corresponds to SO_SNDBUF/SO_RCVBUF settings of TCP layer. Requires restart of PeSIT servers when changed.


Set Receive Buffer size to zero to eliminate socket buffering.


Pesit.Server.Socket.Buffer.Size=0




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter Pesit.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


Pesit.preferBouncyCastleProvider=true
OR
Pesit.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, PESITD waits for the timeout period specified in the server configuration parameter Pesit.GracefulShutdownTimeout before stopping the PESITD service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions PESITD will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Graceful shutdown logging interval - The Server Log displays information about active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


Pesit.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The PeSIT daemon can print SSL/TLS security parameters (TLS version and Cipher Suite) for newly successfully established connections when a server configuration parameter SSLLogging.Pesit is set to true. The sample INFO message in Server Log looks like below:


Establishing PeSIT SSL connection with host 127.0.0.1, using cipher suite: TLS_AES_256_GCM_SHA384 and TLS/SSL protocol: TLSv1.3.


SSLLogging.Pesit=true




10. SOCKS Proxy Tuning


Socks.Idle.Timeout

If server-initiated transfers using FTP(S) are passing through the SOCKS5 proxy, increase the value of the Socks.Idle.Timeout server configuration parameter on the SecureTransport Edge from 600000 to 7200000 milliseconds.


Socks.Idle.Timeout=7200000




Server IP (interface)

Specifies the server host for proxy server. The default value is 0.0.0.0. In other words, configure the interface that faces internally (backends) if you have multiple interfaces.


OutboundConnections.Proxy.serverHost




Client IP (interface)

Specifies the source address/hostname for outgoing connections established from the Proxy service. Only useful on systems with more than one address. In other words, configure the interface that faces externally (internet).


OutboundConnections.Proxy.clientHost




11. Maintenance Applications Tuning


Default maintenance applications

Upon new installation the following maintenance applications are enabled to execute at midnight (12:00 am): Audit Log Maintenance, LogEntry Maintenance, Package Retention Maintenance, Sentinel Link Data Maintenance, and Transfer Log Maintenance. The partition creation service is also configured to execute at midnight. Finally, Statistics summary for usage reporting is also triggered at midnight and there is currently no option to change the execution time. All of these work with the database. To reduce the pressure on the database and the shared storage, and to avoid collisions it is better to rearrange the execution times.


The proposed schedule assumes that quiet hours begin after midnight. This is not valid for all environments. Choose the execution times based on analysis of file transfers and client login patterns.


Audit Log Maintenance - The default configuration deletes 6 months old audit log records in chunks from table auditlog. See the Audit Log section in the Transaction Manager Tuning chapter above. Optionally export deleted records to a CSV file is enabled by default. The default schedule is to run every 1st day of the month at 12:00 AM. Change the start time to 12:15 AM or other suitable time during quiet hours.


LogEntry Maintenance - This application maintains Server Logs, by dropping partitions for tables logging_event, logging_event_exception, and logging_event_property. See the Partitions section in the Transaction Manager Tuning chapter above. The default configuration keeps 1 day of server logs. Depending on the configuration, user activities and the load increase the days to keep to 3, 5, 7 (1 week), ... up to 14 (2 weeks). Usually, 5 days are enough for troubleshooting. To keep logs for longer time consider configuring logs to write to two appenders, one in the database and second one to a flat file. Optionally before dropping the partitions ST can export them to a PostgreSQL custom-format archive file enabled by default. The default schedule is to run everyday at 12:00 AM. Change the start time to 12:30 AM or other suitable time during quiet hours.


Package Retention Maintenance - This application deletes expired file packages from Ad Hoc file transfers. Make sure that the PackageRetentionMaintApp rule package from the Transaction Manager settings is enabled. The default schedule is not set. Set the schedule everyday at 12:45 AM or other suitable time during quiet hours.


Sentinel Link Data Maintenance - This application removes all SentinelLinkData table entries to files that do not exist anymore. The table SentinelLinkData is populated only if Send Events to Axway Sentinel or Decision Insight Server is enabled. The default schedule is to run every first Tue of the month at 12:00 am. Change the start time to 01:00 AM or other suitable time during quiet hours. If Ad Hoc file transfers are in use, then change the start time to 04:00 AM.


Transfer Log Maintenance - This application maintain File Tracking, by dropping partitions for tables subtransmissionstatus, transferdata, transferdetails, transferprotocolcommands, and transferresubmitdata. See the Partitions section in the Transaction Manager Tuning chapter above. The default configuration keeps 30 days of File Tracking. Depending on the configuration and the load decrease the days to keep to 14 (2 weeks), 10, 7 (1 week). Usually, 30 days can be handled by ST. Optionally before dropping the partitions ST can export them to a PostgreSQL custom-format archive file, enabled by default. The default schedule is to run everyday at 12:00 AM. Change the start time to 01:30 AM or other suitable time during quiet hours.




Archive Maintenance

The Archive Maintenance application automatically deletes files based on a schedule. See the File Archiving section in the Transaction Manager Tuning chapter above. The default proposed schedule when adding the application is to run everyday at 12:00 AM. Change the start time to 11:00 PM or other suitable time during quiet hours.


Enable Multithreading

When the Archive Maintenance application is to process a large number of files, it can be executed multi-threaded. To enable multithreading, set the number of threads to execute file deletion in the server configuration parameter FileArchiving.DeleteFiles.ProcessingThreads. The default value is 1.


Increasing the number of threads increases the load on the storage on which the application operates. The number of threads should not exceed 16.


Set maximum run time

Occasionally, if the Archive Maintenance application is processing a large number of files, it may not be able to finish until the next scheduled occurrence. In this case, it may be advisable to specify the maximum time (in minutes) that you expect the application to run in the server configuration parameter FileArchiving.DeleteFiles.MaximumProcessingTime. The default value is 0, which means the application continues to run until it completes.


The Archive Maintenance application is not configured in new installations of SecureTransport. The mentioned parameters are applicable only if the application is created and configured.


FileArchiving.DeleteFiles.ProcessingThreads=4
FileArchiving.DeleteFiles.MaximumProcessingTime=0




Accounts Maintenance

The Accounts Maintenance application can disable, delete, or delete and purge accounts based on account inactivity or age. Make sure that the AccountMaintenanceApp rule package from the Transaction Manager settings is enabled. More details for the configuration are available in Administrator Guide -> Account Maintenance application. The default schedule is not set. Set the schedule everyday at 02:00 AM or other suitable time during quiet hours.




Unlicensed Accounts Maintenance

The Unlicensed Accounts Maintenance application deletes unlicensed user accounts that have been inactive for a specified period of time (60 days by default). Make sure that the UnlicensedAccountMaintApp rule package from the Transaction Manager settings is enabled. More details for the configuration are available in Administrator Guide -> Unlicensed Account Maintenance application. The default schedule is not set. Set the schedule everyday at 03:00 AM or other suitable time during quiet hours.




Login Threshold Maintenance

The Login Threshold Maintenance application unlocks accounts locked according to the selected "Lock account after N successful logins" option in the Account settings and sends a report to specified email contacts. Make sure that the LoginThresholdMaintenanceApp rule package from the Transaction Manager settings is enabled. The default schedule is not set. Set the schedule every 30 minutes or other suitable time period.




File Maintenance

The File Maintenance application deletes files from the account home folders based on a specified retention or expiration period. You can schedule the maintenance and configure notifications to be sent to specific recipients before or/and after the deletion of files. Make sure that the FileMaintenanceApp rule package from the Transaction Manager settings is enabled. More details for the configuration are available in Administrator Guide -> File Maintenance application. The default schedule is not set. Set the schedule everyday at 04:00 AM or other suitable time during quiet hours.




Proposed schedule for maintenance applications

Maintenance application Default schedule Suggested schedule Keep data for Comments
Audit Log Maintenance Every 1st day of the month at 12:00 am. Every 1st day of the month at 12:15 am. 6 months Use smaller chuncks if TM goes OOM during execution.
LogEntry Maintenance Everyday at 12:00 am. Everyday at 12:30 am. 5 days Keep data for as minimum as possible.
Package Retention Maintenance No schedule is defined. Everyday at 12:45 am. variable per package Configure if you use ad hoc file transfers.
Sentinel Link Data Maintenance Every first Tue of the month at 12:00 am. Every Sun at 01:00 am.
or
Every Sun at 04:00 am.
existing files In busy environments with lots of transfers may need to schedule more often.
Transfer Log Maintenance Everyday at 12:00 am. Everyday at 01:30 am. 30 days In busy environments with lots of transfers may need to reduce the days to keep data.
Archive Maintenance Everyday at 12:00 am. Everyday at 11:00 pm. 5 days Make sure you use a separate mount point for archives with async mount option.
Accounts Maintenance No schedule is defined. Everyday at 02:00 am. your choice Use on demand.
Unlicensed Accounts Maintenance No schedule is defined. Everyday at 03:00 am. 60 days Keep data counts consecutive inactive days
Login Threshold Maintenance No schedule is defined. Every 30 minute(s). - Use on demand.
File Maintenance No schedule is defined. Everyday at 04:00 am. 30 days
or
5 days
Use on demand.




12. Shared Storage Tuning


The Standard Cluster distributes the load between the servers in milliseconds times. The various stages of file processing for server-initiated transfers can be handled by any node in the cluster at any time. This requires similar fast access to the storage with minimum latency. Synchronization of data on the storage presented to ST servers is critical especially for small files. In addition, ST keeps some metadata in the STFS directory structure as subfolder in each subscription folder and account home folder. This metadata consists of very small files accessed many times for read and write during the file processing. See STFS attribute files and caching for details. So, the shared storage greatly affects the performance of ST and needs special attention.


Network latency and jitter

The most important parameter is network latency. Over the years a practical limit of 10 ms was accepted. This means that network latency above 10 ms causes a huge (noticeable) performance degradation of ST transfers and it is considered as unsupported. In fact, with all features now available in ST 5.5 even 10 ms is huge delay which drastically reduces the capacity of the ST Standard Cluster (and Enterprise Cluster for that matter). The network latency is usually constant in LAN segments, but it could vary in complex virtual networks with multiple paths to destination and when congestion is present. The variation is called network jitter or packet delay variation (RFC 3393), and it is very bad for storage performance and ST clusters in general. Both network latency and jitter can be addressed for large file transfer with appropriate buffering, but that is not the case for small files and single operations like checking if file exists, opening file, writing content, closing file, changing ownership and permissions, etc. The performance as a whole requires the storage to be as close as possible to the ST cluster (same rack, same switch) to have as minimum as possible constant latency. So, if the storage is clustered, a distributed design is not supported.


Access to storage

During the file processing only ST must have exclusive access to the shared storage. Other storage clients can affect storage synchronization and interfere with the file processing directly by performing actions on the files and folders or via security tools (especially antivirus or antimalware software) scanning files and folders. Some vendor storages support multi-protocol sharing. This mode affects the synchronization capabilities of the storage and provides easy access to other systems. For error free operations this mode is unsupported.


Security tools

Any security tools automatically scanning files directly on the shared storage are not supported because they interfere with ST during file processing. These are antivirus or antimalware applications with real time scanning enabled running on ST servers or other systems with access to the shared storage, and the ST accounts' home folders are not excluded from scanning. If you need to scan files arriving via ST, you need to use the ICAP interface with ST - ICAP settings.


This article will cover only the most popular and widely used protocols for access to network shared storage - NFS and CIFS (SMB).




ST on Linux

Linux has native support for the Network File System (NFS) protocol. ST 5.5 supports NFS versions NFSv3 and NFSv4. This is the most widely used protocol with ST.


  • On premise installations usually use some high-end Storage Area Network (SAN) or Network Attached Storage (NAS) devices from popular vendors like NetApp, Dell EMC, IBM, HPE, Veritas, Synology, etc. They are configured as NFS server or rarely as CIFS share. Another approach is to use a Linux machine configured as NFS server. Supported NFS versions are NFS v3.0, v4.0, v4.1, and v4.2
  • In Amazon cloud ST supports "Amazon EFS over NFS v4.0 and v4.1" and "Amazon FSx for OpenZFS over NFS v3.0 and v4.0".
  • In Azure cloud ST supports "Azure NetApp Files (ANF) over NFS v3.0".
  • In Google cloud ST supports "Google Filestore over NFS v3.0".


The NFS server must export the file system with the sync and no_wdelay options. It is also recommended to add mount option no_subtree_check.


NFS client mount options

There are two working modes (sync and async). Originally it was required to use the sync mount option which means any system call that writes data to files on that mount point causes that data to be flushed to the server before the system call returns control to the user space. This provides greater data cache coherence among clients, but at performance cost. The performance cost could be significant when uploading large files in cloud environments. Recently Axway validated a combination of mount options to work in async mode (using Linux cache). This is possible thanks to the mount option lookupcache=positive. The best approach is to try async mode first and if not working satisfactory, to fall back to sync mode.


async mode (generic)


async,actimeo=3,lookupcache=positive,nfsvers={VERSION},rsize={NUM},wsize={NUM},hard,timeo=600,retrans=2


sync mode (generic)


sync,actimeo=1,nfsvers={VERSION},rsize={NUM},wsize={NUM},hard,timeo=600,retrans=2


Amazon EFS


async,actimeo=3,lookupcache=positive,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport


Amazon FSx for OpenZFS


async,actimeo=3,lookupcache=positive,nfsvers=3,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev


Azure NetApp Files, NetApp ONTAP 9.8 (NFS in NetApp ONTAP)


nconnect=8,async,actimeo=3,lookupcache=positive,nfsvers=3,rsize=262144,wsize=262144,hard,timeo=600,retrans=2


Google Filestore (Filestore instance performance)


nconnect=2,async,actimeo=3,lookupcache=positive,nfsvers=3,rsize=524288,wsize=524288,hard,timeo=600,retrans=3,resvport


For High scale SSD


nconnect=7,async,actimeo=3,lookupcache=positive,nfsvers=3,rsize=524288,wsize=524288,hard,timeo=600,retrans=3,resvport


The nconnect mount option requires support from both the NFS server and the NFS client. Linux provides support for nconnect in kernel version 5.3 and higher. The nconnect works with NFSv4.1, NFSv4.2, and NFSv3.


async - The NFS client delays sending application writes to the server and puts the data in the disk cache. In other words, under normal circumstances, data written by an application may not immediately appear on the server that hosts the file. This can trigger retries in ST on another node (another NFS client) because it does not yes see the file when it begins processing it. The retry mechanism is not applicable or implemented for every kind of IO operations performed by ST. Use async mode together with lookupcache=positive to orchestrate the flow of events in such a way as to avoid processing failures and minimize the retries.


Do not use noac or actimeo=0 together with async mode because this can corrupt stfs attribute files.


sync - The NFS client flushes the application writes to the server before the system call returns the control to the user space. In other words, data written by an application is already present on the server that hosts the file, and it is available for access by other NFS clients. Smoother processing but slower for small files. On a high load system with predominantly small files sync mode works better that async mode where the ST application will be executing retry cycles.


actimeo - This option sets the 4 mount options acregmin, acregmax, acdirmin, and acdirmax to the same value. These mount options control the NFS client cache for the filesystem attributes of regular files and directories. The option "actimeo=3" means that the NFS client will cache the attributes for 3 seconds before requesting fresh attribute information from NFS server. The ST application expects consistent information for filesystem attributes at any time on any node. That is why the best option is to have no filesystem attributes cache. Turning off caching with the noac mount option is a killer for the performance in modern virtual environments and it is not advisable. Use as short as possible cache times for sync mode. One second is usually enough for modern storages but, depending on the hardware, you may need to increase it to two or even three seconds. For async mode usually three seconds are fine, but you may need to reduce it to two seconds or even one second, because of its effect on lookupcache=positive. To find the right value create an account in ST with 30 Subscriptions and put a thousand files in the account home folder. Login and logout many times and measure the login times. If there is no significant difference with cache set to one, two or three seconds choose the smallest value. If the login time is better with two seconds, use "actimeo=2" and finally, if login times are better with three seconds, use "actimeo=3".


lookupcache - If pos or positive is specified, the client assumes positive entries are valid until their parent directory's cached attributes expire, but always revalidates negative entires before an application can use them. Always use this mount option with positive value for async mode in combination with the actimeo option described above.


nfsvers - The NFS protocol version number used to contact the server's NFS service. If the server does not support the requested version, the mount request fails. Sometimes there is no choice when only a particular version is supported (see above). When you have a choice consult with the storage vendor if there is a preference. In perfect conditions it does not matter which one will be selected. One version could be better in some environments, while the other version will be better in other, and this can be identified by load test with a desired traffic pattern. Note that NFSv3 usually uses UDP transport which could make a difference.


rsize - The maximum number of bytes in each network READ request that the NFS client can receive when reading data from a file on an NFS server. The actual data payload size of each NFS READ request is equal to or smaller than the rsize setting. The largest read payload supported by the Linux NFS client is 1,048,576 bytes (one megabyte). The client and server negotiate the largest rsize value that they can both support. Usually, the same value is used for both rsize and wsize. Check with your storage vendor for optimal values. In general, the bigger buffer the better throughput when there are no network or storage constraints.


wsize - The maximum number of bytes per network WRITE request that the NFS client can send when writing data to a file on an NFS server. The actual data payload size of each NFS WRITE request is equal to or smaller than the wsize setting. The largest write payload supported by the Linux NFS client is 1,048,576 bytes (one megabyte). The client and server negotiate the largest wsize value that they can both support. Usually, the same value is used for both wsize and rsize. Check with your storage vendor for optimal values. In general, the bigger buffer the better throughput when there are no network or storage constraints.


hard - Determines the recovery behavior of the NFS client after an NFS request times out. With the hard option NFS requests are retried indefinitely. For ST data integrity is more important than NFS client responsiveness. That is why it is not recommended to use a soft mount option, which can cause silent data corruption in certain cases.


timeo - The time in deciseconds (tenths of a second) the NFS client waits for a response before it retries an NFS request. The NFS client over TCP performs linear backoff: After each retransmission, the timeout is increased by timeo up to the maximum of 600 seconds. However, for NFS over UDP, the client uses an adaptive algorithm to estimate an appropriate timeout value for frequently used request types (such as READ and WRITE requests) but uses the timeo setting for infrequently used request types (such as FSINFO requests).


retrans - The number of times the NFS client retries a request before it attempts further recovery action. If the retrans option is not specified, the NFS client tries each request three times. The NFS client generates a "server not responding" message after retrans retries, then attempts further recovery (depending on whether the hard mount option is in effect).


noresvport - Specifies that the NFS client should use a non-privileged source port when communicating with an NFS server for this mount point. Using non-privileged source ports helps increase the maximum number of NFS mount points allowed on a client, but NFS servers must be configured to allow clients to connect via non-privileged source ports. The exact range of privileged source ports that can be chosen is set by a pair of sysctls to avoid choosing a well-known port, such as the port used by SSH. This means the number of source ports available for the NFS client, and therefore the number of socket connections that can be used at the same time, is practically limited to only a few hundred. As described above, the traditional default NFS authentication scheme, known as AUTH_SYS, relies on sending local UID and GID numbers to identify users making NFS requests. An NFS server assumes that if a connection comes from a privileged port, the UID and GID numbers in the NFS requests on this connection have been verified by the client's kernel or some other local authority. This is an easy system to spoof, but on a trusted physical network between trusted hosts, it is entirely adequate. Using non-privileged source ports may compromise server security somewhat, since any user on AUTH_SYS mount points can now pretend to be any other when making NFS requests. Thus, NFS servers do not support this by default. They explicitly allow it, usually via an export option.


nconnect - The purpose of nconnect is to provide multiple TCP connections to NFS server, which can increase performance and throughput. The current limit of client-server connections opened by nconnect is 16.


It's not recommended to use nconnect and sec=krb5* mount options together. Using these options together can cause performance degradation.


_netdev - This is not really an NFS client mount option. This forces systemd to consider the mount unit a network mount and systemd should mount it only after the network is available. Usually, detection works fine and this mount option is not needed. Using this option overrides the detection and specifies that the mount requires network.


CIFS (SMB) client mount options

CIFS, or the Common Internet File System, is a dialect of the Server Message Block (SMB) protocol. The SMB3 protocol is the successor to the CIFS (SMB) protocol and is supported by most Windows servers, Azure (cloud storage), Macs and many other commercial servers and Network Attached Storage appliances as well as by the popular Open Source server Samba.


CIFS is not a separate protocol but rather a specific implementation or version of the SMB protocol. Modern systems and Microsoft itself now recommend against using CIFS (which corresponds to SMB 1.0) in favor of newer, more secure, and better-performing SMB versions, such as SMB 3.0 and above. Modern SMB versions offer significantly more functionality and efficiency, which CIFS lacks.


This section is currently under development. More information about CIFS will be added in the near future.


Check if the storage is running in sync or async mode

The Linux dd command can bypass Linux cache (oflag=dsync) and write data to the storage synchronously. If the two commands below produce the same results, then the NFS client is using a sync mode. For async mode the second command will return a significantly high transfer rate (see example below). The chosen block size 1460 fills one network packet.


dd if=/dev/zero of={shared storage mount point}/test.small bs=1460 count=1000 oflag=dsync


dd if=/dev/zero of={shared storage mount point}/test.small bs=1460 count=1000


Example results of the dd command writing to root local filesystem which definitely uses Linux cache.


[root@RHEL84-axwg3 ~]# dd if=/dev/zero of=/root/test.small bs=1460 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
1460000 bytes (1.5 MB, 1.4 MiB) copied, 3.56844 s, 409 kB/s
[root@RHEL84-axwg3 ~]# dd if=/dev/zero of=/root/test.small bs=1460 count=1000
1000+0 records in
1000+0 records out
1460000 bytes (1.5 MB, 1.4 MiB) copied, 0.00243722 s, 599 MB/s




Retries in ST for important IO operations

ST provides a retry mechanism for some important IO operations on a shared storage. The retry mechanism works the same way for all types of retries described below. When retry is triggered, ST calculates a backoff time by multiplying the retry number by retryTime (pauseTime). After the backoff time expires, ST retries the failed operation.


For example, with retryTime=100 the 5-th retry will be executed after 5 * 100 = 500 milliseconds (0.5 seconds). The total time to exhaust the 10 retries and permanently fail the operation will be 1 * 100 + 2 * 100 + ... + 10 * 100 = 5500 milliseconds (5.5 seconds). A common mistake when increasing retries is to double both the number of retries and retryTime. For the provided example, the total time is 1 * 200 + 2 * 200 + ... + 20 * 200 = 42000 milliseconds (42 seconds). This is a too long retrying cycle with long backoff times. A better approach is to increase the number of retries and slightly reduce retryTime or leave retryTime at default. For example with retries=20 and retryTime=90 the total time is 1 * 90 + 2 * 90 + ... + 20 * 90 = 18900 milliseconds (18.9 seconds).


NFS Support

This is the general retry mechanism to get access to existing files. Originally implemented for NAS using NFS in async mode. Effectivly, this retry mechanism is applicable for any kind of shared storage. The retry triggers on exceptions thrown by the Java functions java.io.File.canRead() and java.io.File.exists(). By default, this mechanism is disabled. To enable it with the recommended values, add the below lines to the STStartScriptsConfig script.


TM_JAVA_OPTS="-Dcom.tumbleweed.tm.nfssupport.NFSSupportConfig.enabled=true $TM_JAVA_OPTS" 
TM_JAVA_OPTS="-Dcom.tumbleweed.tm.nfssupport.NFSSupportConfig.retryCount=5 $TM_JAVA_OPTS" 
TM_JAVA_OPTS="-Dcom.tumbleweed.tm.nfssupport.NFSSupportConfig.pauseTime=200 $TM_JAVA_OPTS"


STFS retries

This retry mechanism is applicable for reading and writing operations of STFS attribute files. Originally it was created for read operations only but then it was extended to write operations. It is enabled by default with 10 retries and retryTime of 100 milliseconds. To increase the retry cycle add the below lines to the STStartScriptsConfig script. New values are 20 retries and retryTime 90 milliseconds (retry cycle 18.9 seconds).


TM_JAVA_OPTS="-Dcom.axway.st.server.fs.attributes.read.retries=20 $TM_JAVA_OPTS"
TM_JAVA_OPTS="-Dcom.axway.st.server.fs.attributes.read.retryTime=90 $TM_JAVA_OPTS"


For further increase change the values to 30 retries and retryTime 60 milliseconds (retry cycle 27.9 seconds).


TM_JAVA_OPTS="-Dcom.axway.st.server.fs.attributes.read.retries=30 $TM_JAVA_OPTS"
TM_JAVA_OPTS="-Dcom.axway.st.server.fs.attributes.read.retryTime=60 $TM_JAVA_OPTS"


AR sandbox retries

This retry mechanism triggers upon NoSuchFileException while ST tries to copy a file from a subscription folder to the sandbox for executing a Route. The retry mechanism is enabled by default with 10 retries and retryTime of 100 milliseconds. The calculation of backoff time is slightly different. The formula is: (retry number - 1) * retryTime = backoff time. To increase the retry cycle add the below lines to the STStartScriptsConfig script. New values are 20 retries and retryTime 90 milliseconds (retry cycle 17.1 seconds).


TM_JAVA_OPTS="-Dcom.axway.st.server.fs.ar.file.processing.retries=20 $TM_JAVA_OPTS"
TM_JAVA_OPTS="-Dcom.axway.st.server.fs.ar.file.processing.retryTime=90 $TM_JAVA_OPTS"


For further increase change the values to 30 retries and retryTime 60 milliseconds (retry cycle 26.1 seconds).


TM_JAVA_OPTS="-Dcom.axway.st.server.fs.ar.file.processing.retries=30 $TM_JAVA_OPTS"
TM_JAVA_OPTS="-Dcom.axway.st.server.fs.ar.file.processing.retryTime=60 $TM_JAVA_OPTS"




Return to table of contents

Created: August 2025 by Evgeni Evangelov