KB Article #191062

SecureTransport 5.5 Tuning - Standard Cluster with PostgreSQL database

Problem

There are many configuration parameters that can be adjusted in SecureTransport and they are spread among many files as well as stored in the DB.


This article aims to help with tuning SecureTransport 5.5 Standard Cluster with PostgreSQL database and finding the necessary place to apply a configuration change.


It's important to keep in mind that tuning is a constantly evolving process in which you establish a set of baselines and optimal settings through repetitive testing and evaluation. There is no definitive guide or a magic set of options, you are responsible for evaluating performance, making incremental changes, and re-evaluating until you reach your goals.


Table of contents




1. Memory Tuning


All protocol daemons have a minimum and a maximum Heap Size value defined by the JAVA_MEM_MIN and JAVA_MEM_MAX parameters. The configuration options are available in the startup scripts start_* located in the $FILEDRIVEHOME/bin folder.


The startup scripts contain default values and must not be edited there. Instead, use global configuration file STStartScriptsConfig, located in the $FILEDRIVEHOME/conf folde, which would allow you to set JAVA_MEM_MIN, JAVA_MEM_MAX, and JAVA_OPTS parameters, so the changes survive when SecureTransport is upgraded. Additional details on this file and its configuration can be found in the Advanced protocol server configuration section of the Admin Guide, available in our Docs portal.


It's important to keep in mind that values provided in the "Advanced service configuration and memory allocation" section of SecureTransport 5.5 Administrator Guide must be treated as example values and not as recommended values.


The standard cluster with PostgreSQL provides additional challenges for memory tuning. The PostgreSQL database needs more RAM because all nodes are working with the primary database, which is then replicated to databases on secondary nodes. To simplify the process, we will split the system RAM memory in two. One half will be used for the ST application java processes. The other half will remain for the OS and the DB. After the tuning the average daily usage of RAM must be between 60% and 80% of the system memory, and during the busy hours the average used RAM must not exceed 90% of the system memory.


WARNING: The actual memory usage of a given daemon can exceed the value defined for Max Heap Size. This is due to the way a JVM works (see off-heap memory), thus one must be cautious not to exhaust the RAM memory available on a given server.


Example values for protocol daemons that would cover most use cases:


JAVA_MEM_MIN="1G"
JAVA_MEM_MAX="2G"


The following table shows example values when all protocol daemons in ST are configured and running. It takes into account the constraints for RAM memory available on the server, based on minimum hardware requirements (System requirements), and a typical configuration with increased RAM and commonly used protocols.


Component Core - 16GB RAM Edge - 8GB RAM Core - 24+GB RAM Edge - 16+GB RAM
JAVA_MEM_MIN JAVA_MEM_MAX JAVA_MEM_MIN JAVA_MEM_MAX JAVA_MEM_MIN JAVA_MEM_MAX JAVA_MEM_MIN JAVA_MEM_MAX
Admin 512M 1G 256M 512M 1G 2G 512M 1G
AS2d 256M 512M 256M 512M 512M 1G 512M 1G
FTPd 256M 512M 256M 512M 512M 1G 512M 1G
HTTPd 512M 1G 256M 512M 512M 1G 512M 1G
PeSITd 256M 512M 256M 512M 512M 1G 512M 1G
SSHd 1G 2G 512M 1G 1G 3G 1G 3G
TM 2G 4G - - 4G 8G - -
monitord 256M 512M 256M 512M 256M 512M 256M 512M
socks - - 256M 512M - - 512M 1G


WARNING: Given the nature of SecureTransport, one cannot easily determine how much memory will be needed on a given environment. After performing an initial tuning, it is recommended to monitor the actual usage of any protocol of interest and then adjust accordingly.


Additional notes:


  • The golden rule for allocating memory for JAVA processes is JAVA_MEM_MIN is a half of JAVA_MEM_MAX.
  • TM is the brain of ST. TM requires more memory, and the allocation is more dynamic. In some configurations it is better to set JAVA_MEM_MIN closer to JAVA_MEM_MAX, for example: 6G/8G. It is recommended to enable GC logging for TM in the STStartScriptsConfig script (see below). Check how to enable the timestamps in custom Garbage Collector logs with Java 11 in article KB 182225.
  • When tuning the memory for the Admin Service on Core servers, one must take into consideration how many administrators would be using the service at a given time. Also, what types of Administrators - Full or Delegated. Delegated administrators consume more memory when doing File Tracking searches (one of the most memory consuming operations).
  • To improve speed for CIT SSH transfers over high bandwidth high latency networks increase the receive and send buffers plus the sliding window size (see below).
  • The command line tools for import/export of account information in XML format create a separate JVM for the duration of execution (see below).
  • Make sure you have at least 30% to 40% free memory on OS when ST is running normally (not loaded). The memory is needed for the OS cache and JVM off-heap memory (see below).




Generic configuration


Here is how a generic configuration of STStartScriptsConfig script will look like based on the minimum hardware requirements and the additional notes above.


#Admin memory settings
ADMIN_JAVA_MEM_MIN="512M"
ADMIN_JAVA_MEM_MAX="1G"
#
#AS2d memory settings
AS2_JAVA_MEM_MIN="256M"
AS2_JAVA_MEM_MAX="512M"
#
#FTPD memory settings
FTP_JAVA_MEM_MIN="256M"
FTP_JAVA_MEM_MAX="512M"
#
#HTTPD memory settings
HTTP_JAVA_MEM_MIN="512M"
HTTP_JAVA_MEM_MAX="1G"
#
#PeSITD memory settings
PESIT_JAVA_MEM_MIN="256M"
PESIT_JAVA_MEM_MAX="512M"
#
#SSHD memory settings and buffers tuning 
SSH_JAVA_MEM_MIN="1G"
SSH_JAVA_MEM_MAX="2G"
SSH_JAVA_OPTS="-DrecvBufferSize=1048576 $SSH_JAVA_OPTS"
SSH_JAVA_OPTS="-DsendBufferSize=1048576 $SSH_JAVA_OPTS"
SSH_JAVA_OPTS="-Dssh.maxWindowSpace=12582912 $SSH_JAVA_OPTS"
#
#TM memory settings
TM_JAVA_MEM_MIN="2G"
TM_JAVA_MEM_MAX="4G"
#Off heap memory limit (The recommended limit is half of the max memory. Remove or comment if you have plenty of server memory.)
TM_JAVA_OPTS="-XX:MaxDirectMemorySize=2048M $TM_JAVA_OPTS"
#TM GC tuning and logging
TM_JAVA_OPTS="-XX:+ExplicitGCInvokesConcurrent $TM_JAVA_OPTS"
GC_LOGGING=true
NumberOfGCLogFiles=30
GCLogFileSize=5000K
#
#Monitord memory settings
MONITORD_JAVA_MEM_MIN="256M"
MONITORD_JAVA_MEM_MAX="512M"
#
#Socks memory settings
SOCKS_JAVA_MEM_MIN="256M"
SOCKS_JAVA_MEM_MAX="512M"
#
# xml_import and xml_export scripts
XML_JAVA_MEM_MIN="256M"
XML_JAVA_MEM_MAX="512M"


More information on monitoring JVM memory: KB 176359 and KB 180171.




Native (Off-Heap) Memory used by JVM


The memory allocated outside of the Java heap and used by the JVM is called native memory, also referred to as off-heap memory. From the operating system's perspective, these memory regions are contiguous sequences of bytes.

Within the JVM, the closest equivalent is a byte array. However, unlike native memory, a Java array is not guaranteed to be stored contiguously — the Garbage Collector may relocate it at any time. Additionally, the internal layout of array data can vary across JVM implementations, since arrays are objects in Java. Because of these differences, transferring data between JVM arrays and off-heap memory requires serialization and deserialization. As a result, performance is directly tied to the efficiency of this serialization process.

The JVM uses native memory to store several types of data that require direct interaction with the operating system or involve I/O operations, including:


  • Thread stacks
  • Internal JVM data structures
  • Memory-mapped files
  • Buffers which require interaction with OS native code or I/O operations (Direct memory)


Unlike heap memory, off-heap memory is not directly managed by the Garbage Collector. However, its release is still indirectly triggered by garbage collection - specifically, when all references to it are removed.

From application perspective, SecureTransport uses memory-mapped files and direct memory described below.


Memory-mapped files


Memory-mapped files are a method that allows a file or portion of a file to be mapped directly into a process's virtual memory address space (outside of Java heap) which is managed by the operating system. The limit is the available virtual memory on the operating system. In most operating systems, the memory region mapped is the kernel's page cache (file cache). Very large files can be mapped without consuming large amounts of memory to manipulate the data. The virtual memory subsystem of the operating system will perform intelligent caching of the pages, automatically managing memory according to system load.

The SecureTransport application is using memory-mapped files for file transformations, repository encryption, coherence cluster hibernate cache files, and the new Amazon S3-compatible file transfers introduced in SecureTransport version 5.5-20251030.


Make sure OS has enough free memory for file system cache and avoid swap utilization.


Direct memory


Direct memory (aka direct buffer memory) is another type of off-heap memory used when the application needs to communicate directly with your computer's operating system, such as when:


  • Reading or writing files
  • Sending or receiving data over a network
  • Performing other input/output operations


The SecureTransport application creates non-direct byte buffers in JVM heap and Java creates temporary direct byte buffers in native memory. During the I/O operations JVM copies the content of the non-direct byte buffers to the temporary direct byte buffers or vice versa. The maximum direct memory size is limited by the maximum heap size, unless explicitly specified by JAVA_OPTS parameter MaxDirectMemorySize.


The recommended limit is half of the max memory (JAVA_MEM_MAX). Remove or comment the following line TM_JAVA_OPTS="-XX:MaxDirectMemorySize=2048M $TM_JAVA_OPTS" from the generic configuration above if you have plenty of server memory. The limit must not be less than 512MB. If you encounter the error "java.lang.OutOfMemoryError: Direct buffer memory" then revise your buffers settings, and if needed increase the direct memory limit. You may need to add more memory to the server or reduce MAX memory setting for some services.


On Linux operating system you can use pmap command to find memory used by process including native memory. The example below return value in kilobytes and you need to replace >PID< with the actual process ID. Note that some of the native memory is shared with other processes and reported resident set size (RSS) can overestimate memory usage. Many alternatives exists like smem for example.


pmap -x -p <PID> |grep total |awk '{print $4}'




2. Database Tuning


Connection pool size


All SecureTransport versions until 5.5-20260226 are using connection‑pooling library C3P0. In C3P0 each component maintained its own connection pool, even when pointing to the same database. It has complex configuration with many parameters, suboptimal performance under high load and inconsistent connection health checking. The C3P0 library is no longer actively maintained. From SecureTransport update 5.5-20260326 C3P0 library was migrated to HikariCP. The HikariCP library introducing a shared DataSource architecture with a better connection reuse. It is lightweight and faster than C3P0. Details can be found in ST documentation at ST HikariCP configuration and migration and HikariCP documentation at HikariCP.


c3p0 in configuration.xml


The configuration changes are to be made to the hibernate.c3p0.min_size and hibernate.c3p0.max_size parameters for each component.


The PostgreSQL database uses memory buffers (work_mem) per connection for query operations such as a sort (ORDER BY, DISTINCT, and merge joins) or hash table (used in hash joins, hash-based aggregation) before writing to temporary disk files. These buffers help to speed up complex queries, but they can eat a lot of memory if you have many connections. The best approach is to use less connections from ST side but with reasonable size of the buffers.

If the environment is upgraded from the legacy Standard Cluster with MariaDB and the original tuning guide was used KB 178443 the values for hibernate.c3p0.min_size and hibernate.c3p0.max_size parameters must be decreased as per the following table.


Component c3p0.min_size c3p0.max_size
Database 2 50
Database_FTPDComponent 5 50
Database_HTTPDComponent 5 50
Database_TransactionManagerComponent 20 100
Database_AS2Component 5 50
Database_SSHDComponent 5 50
Database_ServerLogComponent 5 50
Database_InstallerComponent 2 20
Database_AdminComponent 20 50
Database_ToolsComponent 1 20
Database_PesitComponent 5 50
Database_SharedRuntimeComponent 5 32
Database_TransferLogComponent 5 50


The values from the above table can be used with new HikariCP library. However it is recommended to use reduced values from below table because HikariCP has a better connection reuse.




HikariCP in configuration.xml


The configuration changes are to be made to the hibernate.hikari.minimumIdle and hibernate.hikari.maximumPoolSize parameters for each component.


Component hikari.minimumIdle hikari.maximumPoolSize
Normal Load High Load
Database 2 10 20
Database_FTPDComponent 2 10 20
Database_HTTPDComponent 2 10 20
Database_TransactionManagerComponent 5 32 64
Database_AS2Component 2 10 20
Database_SSHDComponent 2 10 20
Database_ServerLogComponent 2 10 20
Database_InstallerComponent 2 8 8
Database_AdminComponent 4 20 40
Database_ToolsComponent 1 5 10
Database_PesitComponent 2 10 20
Database_SharedRuntimeComponent 2 32 64
Database_TransferLogComponent 2 10 20


Please read the ST documentation for other recommended tuning at ST HikariCP configuration and migration. Note that upon upgrade ST keeps some C3P0 which could be misleading.

The SecureTransport is still setting the parameter connectionTestQuery which activate legacy method to validate that the connection to the database is still alive. This cause idle connections to stay opened exhausting the resources on the PostgreSQL database. Please remove all occurences of hibernate.hikari.connectionTestQuery="SELECT 1" from configuration.xml.




TCP Keepalive


Tuning for TCP keepalive parameters is recommended for a better PostgreSQL experience. For new HikariCP library in order to avoid a rare condition where the pool goes to zero and does not recover it is necessary to configure TCP keepalive. The PostgreSQL database default configuration is using the operating system settings. Both Linux and Windows has large idle time of 2 hours. It is recommended to tune TCP keepalive settings on all possible places: operating system, database server and database client.


On Linux


This is done by editing the /etc/sysctl.conf file:


#To detect dead connections after 80 seconds.
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3


To activate the settings without rebooting the machine, run sysctl -p.


On Windows


Change the TCP keepalive settings by adding these registry keys:


HKEY_LOCAL_MACHINE/System/CurrentControlSet/Services/Tcpip/Parameters/KeepAliveTime DWORD 60000
HKEY_LOCAL_MACHINE/System/CurrentControlSet/Services/Tcpip/Parameters/KeepAliveInterval DWORD 2000


The values are in milliseconds and require Windows restart to activate them. The keepalive retry count is not configurable on Windows and the default value is 10 times.


On DB client


Add the option tcpKeepAlive=true to the PostgreSQL jdbcUrl connection string in configuration.xml.


jdbcUrl="jdbc:postgresql://${host}:${port}/${databaseName}?tcpKeepAlive=true"


On DB server


Please check TCP SETTINGS section in database tuning below.




Embeded PostgreSQL database

Changes are to be made to $FILEDRIVEHOME/var/db/postgresql/data/postgresql.conf


Max Connections

The maximum number of concurrent connections the database server will allow. In a Standard Cluster with PostgreSQL, all nodes will access the configuration and file tracking data in the primary database and each node will write the server logs in its own local database. Based on the recommended number of connections from the table above for c3p0 min and max size, here are the recommended values for different types of deployments.


Parameter Core servers Edge Servers
Standalone 2-node cluster 3-node cluster Standalone 2-node cluster 3-node cluster
max_connections 1000 2000 2500 1000 1500 2000


A generic approach is to configure max_connections = 2500 for Core servers and max_connections = 2000 for Edge servers.


Note that Axway has validated successful deployments with up to 4 SecureTransport Edges in synchronization (cluster).


If you use HikariCP with reduced values for hibernate.hikari.minimumIdle and hibernate.hikari.maximumPoolSize parameters in configuration.xml then configure max_connections = 1500 for Core servers and max_connections = 1000 for Edge servers. Note that max_connections parameter is just a limit and you can safely use a higher values from the table above.


Max open files

The maximum number of simultaneously open files allowed for each server subprocess. The default is one thousand files. To be on the safe side, with partitioned tables for file tracking and server logs it is recommended to increase the value to five thousand files. This requires increasing the system-wide ulimit for the PostgreSQL user on Linux installations to 131072.


max_files_per_process = 5000




RESOURCE USAGE (except WAL)

Memory

  • shared_buffers - The amount of memory the database server uses for shared memory buffers. Recommended size is 15% to 25% of the machine's total RAM for standalone DB server. The Postgre default is 128 megabytes (128MB). A value that would cover most use cases is from 1GB to 4GB (see table below).
  • huge_pages - Controls whether huge pages are requested for the main shared memory area. With huge_pages set to try (default), the server will try to request huge pages, but fall back to the default if that fails. Huge pages are known as large pages on Windows. Some internet resources recommend turning it off especially on Windows. Postgre recommendation is if available use it. It reduces CPU usage and memory fragmentation.
  • temp_buffers - These are session-local buffers used only for access to temporary tables. The Postgre default is eight megabytes (8MB). Use the default.
  • work_mem - The base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files. The formula to calculate the size for standalone DB server is: Total RAM * 0.25 / max_connections. The Postgre default value is four megabytes (4MB). A value that would cover most use cases is 8MB. Higher values imply a risk for OS performance degradation and memory fragmentation.
  • maintenance_work_mem - The maximum amount of memory to be used by maintenance operations like vacuum, create index, and alter table add foreign key operations. A maximum of 2GB is enough for all use cases even when you have a lot of available RAM memory. On Windows there is a limitation to be less than 2GB. So, generally the max value is maintenance_work_mem = 2047MB.


Parameter Core servers Edge Servers
16GB RAM 24GB RAM 32GB RAM 48GB RAM 64GB RAM 8GB RAM 16GB RAM 24GB RAM 32GB RAM 48GB RAM
shared_buffers 1GB 1GB or 2GB 2GB or 3GB 4GB 6GB 512MB 1GB 1GB or 2GB 2GB or 3GB 4GB
huge_pages try try try try try off try try try try
temp_buffers 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB
work_mem 4MB 8MB 8MB 8MB 8MB 2MB 4MB 8MB 8MB 8MB
maintenance_work_mem 384MB 512MB 1GB 1536MB 2047MB 320MB 384MB 512MB 1GB 1536MB


Asynchronous Behavior

  • effective_io_concurrency - The number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. The default is 1 on supported systems. Currently, this setting only affects bitmap heap scans. It requires the posix_fadvise function, which is not available on Windows. On Linux with SSD disks recommended value is 200. On Windows use 0 or leave commented.
  • maintenance_io_concurrency - Similar to effective_io_concurrency, but used for maintenance work that is done on behalf of many client sessions. The default is 10 on supported systems. On Linux use the default value 10 or leave commented. On Windows use 0 or leave commented.
  • max_worker_processes - The maximum number of background processes that the system can support. Set as equal to number of CPUs, but min 4 and max 16.
  • max_parallel_workers_per_gather - The maximum number of workers that can be started by a single Gather or Gather Merge node. Parallel workers are taken from the pool of processes established by max_worker_processes, limited by max_parallel_workers. Set to half of max_worker_processes.
  • max_parallel_maintenance_workers - The maximum number of parallel workers that can be started by a single utility command. Currently, the parallel utility commands that support the use of parallel workers are CREATE INDEX only when building a B-tree index, and VACUUM without FULL option. Parallel workers are taken from the pool of processes established by max_worker_processes, limited by max_parallel_workers. Set to half of max_worker_processes.
  • max_parallel_workers - The maximum number of workers that the system can support for parallel operations. Taken from the pool of worker processes. Set to the same number as max_worker_processes.




WRITE-AHEAD LOG

  • wal_level - determines how much information is written to the WAL. The default value is replica, which writes enough data to support WAL archiving and replication, including running read-only queries on a standby server. For standard cluster set to logical.
  • wal_buffers - The amount of shared memory used for WAL data that has not yet been written to disk. When value is -1 it is calculated to shared_buffers/32. Set the value to 16MB.
  • max_wal_size - Maximum size to let the WAL grow during automatic checkpoints. This is a soft limit; WAL size can exceed max_wal_size under special circumstances, such as heavy load, a failing archive_command or archive_library, or a high wal_keep_size setting. Set the value to 4GB.
  • min_wal_size - Minimum size to ensure that enough WAL space is reserved to handle spikes in WAL usage, for example when running large batch jobs. As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use at a checkpoint, rather than removed. Set the value to 1GB.
  • wal_writer_delay - Specifies how often the WAL writer flushes WAL, in time terms. After flushing WAL the writer sleeps for the length of time given by wal_writer_delay, unless woken up sooner by an asynchronously committing transaction. The default value is 200ms and it may cause discrepancy for the events replication during failover scenario. Set the value to 50ms.
  • wal_writer_flush_after - Specifies how often the WAL writer flushes WAL, in volume terms. If the last flush happened less than wal_writer_delay ago and less than wal_writer_flush_after worth of WAL has been produced since, then WAL is only written to the operating system, not flushed to disk. The default value is 1MB and it may cause discrepancy for the events replication during failover scenario. Set the value to 64kB.




QUERY TUNING

  • random_page_cost - Sets the planner's estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0. Storage that has a low random read cost relative to sequential (SSD), might also be better modeled with a lower value random_page_cost = 1.1.
  • effective_cache_size - Sets the planner's assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. The default is 4 gigabytes (4GB). A value that would cover most use cases is from 4GB to 8GB.




REPLICATION

These settings are applicable only for cluster environments. For standalone server installations please ignore.


  • max_wal_senders - Specifies the maximum number of concurrent connections from standby servers or streaming base backup clients. A logical replication subscription needs one connection. One or more is needed for table synchronization. ST core servers create 3 subscriptions in each direction. Set the value to 20.
  • max_replication_slots - Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys. A logical replication subscription needs one slot. One or more is needed for table synchronization. Set the value to 20.
  • wal_keep_size - Specifies the minimum size of past WAL files kept in the pg_wal directory, in case a standby server needs to fetch them for streaming replication. If a standby server connected to the sending server falls behind by more than wal_keep_size megabytes, the sending server might remove a WAL segment still needed by the standby, in which case the replication connection will be terminated. Recommended size is 2048 MB. This value allows a few hours or more of keeping replication data when no connection with standbys. Longer outages need manual data restore (Perform manual data restore).
  • max_slot_wal_keep_size - Specify the maximum size of WAL files that replication slots are allowed to retain in the pg_wal directory at checkpoint time. Recommended size is 1024 MB. This value allows a few hours or more of keeping replication data when no connection with standbys. Longer outages need manual data restore (Perform manual data restore). If the value is not set the database will use the default value -1, which means replication slots may retain an unlimited amount of WAL files.
  • max_logical_replication_workers - Specifies maximum number of logical replication workers. This includes leader apply workers, parallel apply workers, and table synchronization workers. Logical replication workers are taken from the pool defined by max_worker_processes. Set to the same number as max_worker_processes.
  • max_sync_workers_per_subscription - Maximum number of synchronization workers per subscription. This parameter controls the amount of parallelism of the initial data copy during the subscription initialization or when new tables are added. Currently, there can be only one synchronization worker per table. The synchronization workers are taken from the pool defined by max_logical_replication_workers. Set to half of max_logical_replication_workers.
  • max_parallel_apply_workers_per_subscription - Maximum number of parallel apply workers per subscription. This parameter controls the amount of parallelism for streaming of in-progress transactions. The parallel apply workers are taken from the pool defined by max_logical_replication_workers. Set to half of max_logical_replication_workers.




REPORTING AND LOGGING

  • log_min_duration_statement - Causes the duration of each completed statement to be logged if the statement ran for at least the specified amount of time. Enabling this parameter can be helpful in tracking down unoptimized SQL queries. Recommended value is 10000 (10 seconds).




TCP SETTINGS

  • tcp_keepalives_idle - Specifies the amount of time with no network activity after which the operating system should send a TCP keepalive message to the client. If this value is specified without units, it is taken as seconds. A value of 0 (the default) selects the operating system's default. Linux and Windows default is 7200 seconds (2 hours). On Windows, setting a value of 0 will set this parameter to 2 hours, since Windows does not provide a way to read the system default value. In sessions connected via a Unix-domain socket, this parameter is ignored and always reads as zero. Recommended value is 60 seconds.
  • tcp_keepalives_interval - Specifies the amount of time after which a TCP keepalive message that has not been acknowledged by the client should be retransmitted. If this value is specified without units, it is taken as seconds. A value of 0 (the default) selects the operating system's default. Linux default is 75 seconds. Windows default is 1 second. On Windows, setting a value of 0 will set this parameter to 1 second, since Windows does not provide a way to read the system default value. In sessions connected via a Unix-domain socket, this parameter is ignored and always reads as zero. Recommended value is 10 seconds.
  • tcp_keepalives_count - Specifies the number of TCP keepalive messages that can be lost before the server's connection to the client is considered dead. A value of 0 (the default) selects the operating system's default. Linux default is 9. Windows default is 10. In sessions connected via a Unix-domain socket, this parameter is ignored and always reads as zero. Recommended value is 3.
  • tcp_user_timeout - Specifies the amount of time that transmitted data may remain unacknowledged before the TCP connection is forcibly closed. If this value is specified without units, it is taken as milliseconds. A value of 0 (the default) selects the operating system's default Linux - 0. In sessions connected via a Unix-domain socket, this parameter is ignored and always reads as zero. Recommended value is 0.
  • client_connection_check_interval - Sets the time interval between optional checks that the client is still connected, while running queries. The check is performed by polling the socket, and allows long running queries to be aborted sooner if the kernel reports that the connection is closed. If the value is specified without units, it is taken as milliseconds. The default value is 0, which disables connection checks. Without connection checks, the server will detect the loss of the connection only at the next interaction with the socket, when it waits for, receives or sends data. If the value is specified without units, it is taken as milliseconds. The default value is 0, which disables connection checks. Without connection checks, the server will detect the loss of the connection only at the next interaction with the socket, when it waits for, receives or sends data. Recommended value is 15000 (15 seconds).




Putting all together based on the size of the server and type of installations

Note that Axway has validated successful deployments with up to 4 SecureTransport Edges in synchronization (cluster).


Parameter Core servers Edge Servers
Minimum hardware 4CPU/16GB RAM From 8CPU/24GB RAM to 24CPU/64GB RAM Minimum hardware 2CPU/8GB RAM From 4CPU/16GB RAM to 24CPU/32GB RAM
Standalone 2-node cluster 3-node cluster Standalone 2-node cluster 3-node cluster Standalone 2-node cluster 3-node cluster Standalone 2-node cluster 3-node cluster
max_connections 1000 2000 2500 1000 2000 2500 1000 1500 2000 1000 1500 2000
max_files_per_process 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000
# RESOURCE USAGE (except WAL)
# - Memory -
shared_buffers 1GB 1GB 1GB 1GB - 6GB 1GB - 6GB 1GB - 6GB 512MB 512MB 512MB 1GB - 4GB 1GB - 4GB 1GB - 4GB
huge_pages try/off try/off try/off try/off try/off try/off off off off try/off try/off try/off
temp_buffers 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB 8MB
work_mem 4MB/8MB 4MB/8MB 4MB/8MB 8MB 8MB 8MB 2MB 2MB 2MB 4MB/8MB 4MB/8MB 4MB/8MB
maintenance_work_mem 384MB 384MB 384MB 512MB - 2047MB 512MB - 2047MB 512MB - 2047MB 320MB 320MB 320MB 384MB - 1536MB 384MB - 1536MB 384MB - 1536MB
# - Asynchronous Behavior -
effective_io_concurrency (Linux/Windows) 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0 200/0
maintenance_io_concurrency (Linux/Windows) 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0 10/0
max_worker_processes 4 4 4 num of CPUs
(8 - 16)
num of CPUs
(8 - 16)
num of CPUs
(8 - 16)
4 4 4 num of CPUs
(4 - 16)
num of CPUs
(4 - 16)
num of CPUs
(4 - 16)
max_parallel_workers_per_gather 2 2 2 4 4 4 2 2 2 2/4 2/4 2/4
max_parallel_maintenance_workers 2 2 2 4 4 4 2 2 2 2/4 2/4 2/4
max_parallel_workers 4 4 4 num of CPUs
(8 - 16)
num of CPUs
(8 - 16)
num of CPUs
(8 - 16)
4 4 4 num of CPUs
(4 - 16)
num of CPUs
(4 - 16)
num of CPUs
(4 - 16)
# WRITE-AHEAD LOG
wal_level replica logical logical replica logical logical replica logical logical replica logical logical
wal_buffers 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB 16MB
max_wal_size 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB 4GB
min_wal_size 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB
wal_writer_delay 200ms 50ms 50ms 200ms 50ms 50ms 200ms 50ms 50ms 200ms 50ms 50ms
wal_writer_flush_after 1MB 64kB 64kB 1MB 64kB 64kB 1MB 64kB 64kB 1MB 64kB 64kB
# QUERY TUNING
random_page_cost 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1
effective_cache_size 4GB 4GB 4GB 4GB - 12GB 4GB - 12GB 4GB - 12GB 3GB 3GB 3GB 4GB - 8GB 4GB - 8GB 4GB - 8GB
# REPLICATION
max_wal_senders - 20 20 - 20 20 - 20 20 - 20 20
max_replication_slots - 20 20 - 20 20 - 20 20 - 20 20
wal_keep_size - 2048 MB 2048 MB - 2048 MB 2048 MB - 2048 MB 2048 MB - 2048 MB 2048 MB
max_slot_wal_keep_size - 1024 MB 1024 MB - 1024 MB 1024 MB - 1024 MB 1024 MB - 1024 MB 1024 MB
max_logical_replication_workers - 4 4 - 8 8 - 4 4 - 4/8 4/8
max_sync_workers_per_subscription - 2 2 - 4 4 - 2 2 - 2/4 2/4
max_parallel_apply_workers_per_subscription - 2 2 - 4 4 - 2 2 - 2/4 2/4
# REPORTING AND LOGGING
log_min_duration_statement 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000
# TCP SETTINGS
tcp_keepalives_idle 60 60 60 60 60 60 60 60 60 60 60 60
tcp_keepalives_interval 10 10 10 10 10 10 10 10 10 10 10 10
tcp_keepalives_count 3 3 3 3 3 3 3 3 3 3 3 3
tcp_user_timeout 0 0 0 0 0 0 0 0 0 0 0 0
client_connection_check_interval 15000 15000 15000 15000 15000 15000 15000 15000 15000 15000 15000 15000


To reload the PostgreSQL database config at runtime login to database and execute: SELECT pg_reload_conf();.




3. Cluster Tuning


SecureTransport cluster basics:


  • Under the hood, the ST Standard Cluster with PostgreSQL is using Oracle Coherence for clustering and forms a "Coherence cluster", just like the ST Enterprise Cluster. It can have two or three nodes (servers).
  • Each server in a Standard Cluster has a local embedded PostgreSQL database and all nodes are working with the primary database. The secondary nodes are replicating all data except the server logs from the primary database.
  • The Edge cluster is simpler, and some deployments do not cluster Edge servers at all. Only the Admin service from each Edge node is a member of the Coherence cluster. The off-heap cache files are not used. In an Edge cluster only small configuration data and the administrator accounts are replicated across the nodes.
  • The Core cluster has two members from each server in the Coherence cluster - the TM and the Admin services. The TM must be the owner of off-heap cache files (ST handles this internally).
  • The principle for forming and monitoring the cluster is the same for Edge servers and Core servers.


Forming a cluster

The Coherence is using the database to allow members to join the cluster and for monitoring cluster nodes. The clusternode table contains one row for each server. Two columns must match the information from a server:


  • The column configurationid must be equal to LocalConfigurationsId from configuration.xml.
  • The column descriptor must be equal to IP address of the server.


The default installation is using multicast discovery mechanism over UDP. In a complex network or for servers with multiple network interfaces, you need to switch to unicast discovery over TCP following the procedure described in KB 178019. The servers must be able to ping each other over ICMP protocol and make sure TCP and UDP ports 8088 to 8093, 7574, and 7 are open on the firewall between the nodes (if any). For standalone installations the clusternode table could be left empty. Changing the IP address of a node requires the use of the options-overwrite.conf file to let ST update the clusternode table upon Admin start, which will allow the server to join the cluster with its new IP address.




Cluster communication

Oracle Coherence supports TLS to secure communication between cluster nodes. ST Standard Cluster with PostgreSQL enables the encryption by default using certificate referenced by admind alias. From SecureTransport version 5.5-20251218 dedicated certificate with alias cluster can be used for TLS-encrypted communication between cluster nodes. This encryption has some performance impact on the coherence cluster. It is recommended to turn off this encryption for core servers if possible, according to network design and internal policies.


Cluster.enable.SSL=false
OR (see note above)
Cluster.enable.SSL=true




Monitoring the cluster

Every node sends status messages according to the server configuration parameter Cluster.Status.heartbeatInterval (every 5 seconds) and updates the timestamp in column lastheartbeat in the clusternode table. If this does not happen within the period defined in the server configuration parameter Cluster.Status.heartbeatTimeout, the node is considered unresponsive and is removed from the cluster. The node itself is automatically restarted by Coherence to bring it back in sync. The default value of 15 seconds for the heartbeat timeout makes the cluster very sensitive to any kind of temporary issue and could lead to unnecessary TM restarts by Coherence. The recommended value is 60 seconds and it can be increased further if needed, up to 120 seconds.


Cluster - Status.heartbeatTimeout

How long after the last heartbeat a node is considered unresponsive and is removed from the cluster (in seconds). Requires Admin and TM restart if changed.


Cluster.Status.heartbeatTimeout=60


Cluster - nodeListRefreshTime

How often (in seconds) should the cluster check for new/removed nodes. Requires restart for the new value to take effect.


Cluster.nodeListRefreshTime=10




4. Transaction Manager Tuning


New installations of ST have increased (default) values for some server configuration parameters like max threads, and new features are usually enabled by default. For upgraded environments existing values are preserved and new features are usually turned off to preserve existing behavior.


Disk I/O

For a each file transfer ST creates a buffer with size specified in TransactionManager.fileOIBufferSizeInKB and flushes the buffer to the disk based on the setting of the parameter TransactionManager.syncFileToDiskEveryKB. While ST can process many transfers in parallel, it could be limited by the parameter TransactionManager.concurrentFileIOMax.


TransactionManager.fileOIBufferSizeInKB sets the size of the buffer for file transfer. Small buffer size will cause more I/O operations and eventually decrease performance especially when the storage works in sync mode. The optimal buffer provides the best performance for the environment with reasonable memory usage. Increasing the buffer too much does not provide further improvements in performance, but uses more physical memory - RAM. The right value depends on the underlying hardware and especially on the shared storage and mount options. The default buffer size is 128 KB which is a good value for modern virtual environments. A buffer size greater than 1 MB usually does not bring further improvements.


Bigger buffer increase Off-heap memory (Direct memory) usage!


To tune the buffer size prepare a test with small files, with large files and with a mix of both. Start with the default value.


TransactionManager.fileIOBufferSizeInKB=128


Increase the buffer by doubling it.


TransactionManager.fileIOBufferSizeInKB=256


If no significant improvement in transfer times is observed, then return the previous value and stop testing. If there is an improvement, then continue increasing the buffer size in a few iterations until finding the optimal value for your environment.


TransactionManager.syncFileToDiskEveryKB specifies when to flush the buffer content to the disk. The default value 0 (zero) means to flush data when the buffer is full. The configuration TransactionManager.syncFileToDiskEveryKB = TransactionManager.fileOIBufferSizeInKB has the same behavior as zero value. Values lower than the buffer size might be helpful for some specific storages in non-virtual environments. For modern virtual environments use a zero value.


TransactionManager.syncFileToDiskEveryKB=0


TransactionManager.concurrentFileIOMax specifies the maximum number of files which can be processed in parallel. This is applicable not only for SIT and CIT transfers, but also for STFS attributes. It is similar to the max_open_files on Linux and is useful when the storage has a limitation applicable for all ST cluster nodes.


A low value of TransactionManager.concurrentFileIOMax could cause a bottleneck and huge performance degradation. In normal circumstances leave the value empty, which means no limit.


TransactionManager.concurrentFileIOMax= (leave empty)




Thread Pools

SecureTransport has 5 thread pools for handling user and server actions. Each pool has a parameter with a suffix minThreads, which sets the number of threads kept available for this pool at any time after the pool was created by the TM. Creating and destroying threads consumes CPU. Increasing this value for environments with high load saves some resources. Each pool has a parameter with suffix maxThreads, which sets the capacity of the pool. TM cannot create more than the specified number of maxThreads and further requests go into the relevant queue. The current default value for all thread pools is 768. The thread pools handling subscriptions SIT transfers have additional parameter with suffix maxThreadsPerGroup. Each subscription defines 3 groups - for incoming transfers, for outgoing Basic Application transfers, and for outgoing Advanced Routing transfers. TM will not assign more than the specified number of threads from thread pool to a group. This is a protection mechanism to avoid exhausting the assigned TM threads for transfers to problematic subscription when the environment is not too busy with other transfers.


Parameter Normal Load High Load
# General pool for processing transfers
EventQueue.ThreadPools.ThreadPool.minThreads 32 64
EventQueue.ThreadPools.ThreadPool.maxThreads 768 1024
EventQueue.ThreadPools.ThreadPool.maxThreadsPerGroup 64 128
# Thread pool for AR routes execution
EventQueue.ThreadPools.AdvancedRouting.minThreads 32 64
EventQueue.ThreadPools.AdvancedRouting.maxThreads 768 1024
EventQueue.ThreadPools.AdvancedRouting.maxThreadsPerGroup 64 128
# Thread pool for processing PeSIT transfers
EventQueue.ThreadPools.PESIT.minThreads 32 64
EventQueue.ThreadPools.PESIT.maxThreads 768 1024
EventQueue.ThreadPools.PESIT.maxThreadsPerGroup 64 128
# Thread pool for handling Streamed InProcess Events for SIT outbound transfers excluding AR transfers
TransactionManager.ThreadPools.ThreadPool.ServerTransfer.minThreads 32 64
TransactionManager.ThreadPools.ThreadPool.ServerTransfer.maxThreads 768 1024
# Thread pool for handling concurrent users (CIT) over streaming channels
TransactionManager.ThreadPools.ThreadPool.EventMonitor.minThreads 32 64
TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxThreads 768 1024


Some configurations works better with increased values for minThreads. Please consider doubling or even tripling the values for minThreads in the table above.


Thread Pools - Rule Engines

RuleEngines are used to evaluate the agent chain for a given event. This setting defines the number of rule engines ST can use concurrently.


TransactionManager.RuleEngine.pool=64




Event Queue

EventQueue - SIT persisted events in the database

Server-initiated Transfers (SIT) in ST are coordinated by events in the Event table in the database (persisted events). Client Initiated Transfers (CITs) do not insert any events in the event queue in the database except start and end events for a transfer, but they are cleared very fast and cannot be observed during the normal processing.


Any CIT becomes a SIT when the file arrives in the subscription folder. An event is inserted in the database with status 0 - Ready. When a thread is assigned to the event for processing the status changes to 1 - Active. An event processor distributes to the SecureTransport Servers in the cluster the events (representing workload tasks) based on the server configuration parameter EventQueue.DispatchPolicy.name. The default policy cacheBasedPolicy directs all events associated with an account to the same server to improve performance. More information about event distribution can be found at: Administrator Guide -> Direct cluster workload.


During the event processing ST may generate new persisted events for the same transfer. Once the event is processed it is removed from the database. At the end upon successful or failed transfer all events related to it are removed from the database. When ST does not have anything to process the event queue in the database is empty. In some abnormal situations ST may leave events which cannot be processed (stuck events) and they can block the subscriptions' scheduled pull execution. Abnormal situations can be the result of TM crashes, out of memory errors, lack of OS resources like file handles, unhandled sequences in ST, null pointer exceptions, specific wildcard pull errors, etc. A stuck event is an active event, but without a corresponding thread which is processing it. ST has an automatic monitoring mechanism for the event queue with an option to delete leftover events. The event queue is very dynamic, and it can be tracked from Admin UI -> Operations -> Event Queue or via the REST API 2.0 resource /events, or with SQL queries described in KB 183116.


EventQueue - CIT persisted events queue size in TM memory

SIT persisted events are not limited by size. CIT persisted events (start and end events) and active SIT events are placed in the EventQueue in TM's memory. This queue has a limit, enabled by default controlled by the server configuration parameter EventQueue.SizeLimit.enable. The default size is 5120.


EventQueue.SizeLimit.enable=true
EventQueue.SizeLimit.maxQueueSize=10240


Further increase of the EventQueue max queue size could cause a deeper issue in abnormal circumstances. Make sure that all other possible optimizations and tuning are in place before increasing the max queue size.



EventQueue - SIT persisted events monitoring

EventQueue.Heartbeat.Interval specifies the events' heartbeat update frequency in seconds. Each thread will attempt to update its heartbeat timestamp in the Event table every X seconds. The default value of 5 seconds is very aggressive for busy environments.

EventQueue.Heartbeat.Timeout specifies the number of seconds, above which SecureTransport will consider a particular event as staying in the queue abnormally long. If an event has not updated its heartbeat timestamp for more than X seconds, the event will be considered stuck and will be reported in the server log with Possible stuck events with expired heartbeat timeout detected. If the event recovery is enabled the event recovery process will be triggered. The default value of 60 seconds is very aggressive for busy environments.

EventQueue.Heartbeat.Recovery.Enabled turns on or off the recovery process which will delete stuck events. The default value is false.


Turn it on if analysis shows that there really are stuck events which can be safely deleted.


EventQueue.Heartbeat.Interval=15
EventQueue.Heartbeat.Timeout=600
EventQueue.Heartbeat.Recovery.Enabled=false
OR (see note above)
EventQueue.Heartbeat.Recovery.Enabled=true


Note that resubmitting a transfer create an event with the original timestamps. If you use external monitoring of event queue and the resubmited transfer stay longer in the system this may trigger a false alarm.



EventQueue - CIT non-persisted events in TM memory

Client-initiated transfers (CIT) in ST are coordinated by EventMonitorService which receives events and callable requests sent from the protocol daemons over streaming channels. These requests arrive in the Event Monitor Queue. Once the Event Monitor finds an available request executor it assigns the request to it and deletes the request from the queue. The Event Monitor Queue has a usage reporting mechanism described below.


TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxQueueSize - Specifies the maximum size of the client-initiated events in the Event Monitor queue waiting for an executor. The default value is 1024.

TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxQueueSize.usageAlertsLogging - Controls the logging of warning messages for changes in the queue size. When enabled, warnings are logged each time the queue size increases or decreases by 10%, given that the queue is over 50% full. The default value is disabled.


TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxQueueSize=10240
TransactionManager.ThreadPools.ThreadPool.EventMonitor.maxQueueSize.usageAlertsLogging=enabled


Further increase of the Event Monitor max queue size could cause a deeper issue in abnormal circumstances. Make sure that all other possible optimizations and tuning are in place before increasing max queue size.




Maximum simultaneous connections to a remote host

Maximum number of concurrent sessions established to any given partner for Server-Initiated Transfers (SITs), that are not triggered by an Advanced Route. The default value is 100.


OutboundConnections.maxConnectionsPerHost=1000





Protocol commands batch size

Maximum size of the protocol commands accumulated in memory before they are persisted into database. The default value is 100. Check the server logs for messages Value of 'Server.ProtocolCommands.batchSize' is too low and may lead to performance degradation.. If you see such a message double the value and monitor for the same messages. If the message is still present increase the size again and so on until the messages disapear.


Server.ProtocolCommands.batchSize=100
OR
Server.ProtocolCommands.batchSize=200
OR
Server.ProtocolCommands.batchSize=400
or etc.





Skip creating AR sandbox if no transformation steps

If a Route includes only Publish To Account and/or Send To Partner steps and does not transform files, SecureTransport has an option to skip the creation of the sandbox folder and the copying of the file from the subscription folder to the sandbox folder. Instead, it will directly transmit the original file from the subscription folder to speed up the route execution. The default value is true (skip sandbox if possible).


Even without transformations, a sandbox folder is created when any of the following is configured in a Send To Partner step: 1) Post Routing Action -> Delete files after step is complete; 2) Send Trigger File is enabled; 3) Configure Advanced PeSIT Settings is enabled.


AdvancedRouting.DontCopyPayload=true





AR redirect sandbox to local disk

The route execution consists of creating a sandbox, copying the file from the subscription folder to the sandbox, executing the route steps, purging the sandbox, and executing the post routing actions. The sandbox is a subfolder structure under the objects folder created in the account's .stfs metadata folder:


<shared storage>/<account home folder>.stfs/objects/<sone identifier 1>/<some identifier 2>/<some identifier 3>/<some identifier 4>


The entire route execution process consists of lots of IO operations on the storage and copy operation for large files take time. The sandbox redirection aims to reduce the IO operations on a shared storage (move them to local disk) and speed up copy operation (the copy is faster from remote shared strage to local disk). The redirection is done by converting the objects folder to a symbolic link pointing to a path on a local disk.


On Windows environment following symbolic links from remote to local disk are usually disabled. To enable following the link in all directions use the command fsutil behavior set SymlinkEvaluation L2L:1 R2R:1 L2R:1 R2L:1. More information is at link fsutil behavior.


AdvancedRouting.sandboxFolderLocation=<absolute path to local disk location>





A new functionality introduced in SecureTransport version 5.5-20250424 offers the option to use hard links in Publish To Account step instead of copying files when publishing them to the receiver's account. A hard link is a reference to the original file that behaves like a separate file. It points to the same data blocks without duplicating the content and using extra disk space. This significantly reduces storage usage and improves performance, especially with large volumes of files.


Requirements for the hard links to work: 1) The source file and the target destination must be on the same file system.; 2) Sender and recipient accounts must use the same encryption mode.; 3) The Append to existing file option (under Collision settings) must be disabled.; 4) On Linux root installations: The sender and recipient accounts must have the same UID.


AdvancedRouting.PublishToAccount.UseHardLinks=true





Proxy blacklisting

The Transaction Manager has a blacklisting mechanism for the SOCKS proxy. More information can be found in KB 181585.


Some errors from partner servers can trigger the blacklisting mechanisms and temporary block all Edge servers in the zone. In such a case turn off blacklisting if TM is not allowed to connect to the partner server directly or turn on direct connection if possible.


Proxy.Blacklisting.Enabled=false

OR

Direct.Connection.When.Proxy.Down=true





SSH connection reuse

For SSH transfers sites SecureTransport has a connection pooling mechanism introduced in SecureTransport version 5.5-20230126 for server-initiated transfers, which allows SecureTransport to reuse recently established connections to the remote SSH server. A separate connection pool is created for each SSH transfer site. If set, the Maximum parallel transfers property defines the maximum number of connections that can be opened by the connection pool per node. The SSH connection pool is off by default.


Ssh.SIT.ConnectionPool.Enabled=true
Ssh.SIT.ConnectionPool.MinEvictableIdleDuration=30
Ssh.SIT.ConnectionPool.TimeBetweenEvictionRuns=15





Partitions

On PostgreSQL, the Log Entry and Transfer Log maintenance applications do not create partitions. Instead, a dedicated create partition service is executed on startup of the Transaction Manager and protocol daemons. The Partiotion.DaysToPrebuild server configuration option can be used to specify the number of days that partitions will be created in advance. If you leave it empty (default), partitions will be created for 3 days ahead. The service will not create new partitions if they have already been created for the specified interval.


By default, the daily partitions are created every day at 00:00. To change the partition creation time, update the value of the PartitionManagement.Create.triggerTime server configuration option. The format is HH:MM, with hours in the range 0–23.


The new installation of SecureTransport set all maintenance applications to start at midnight (00:00). See below Maintenance Applications Tuning. Statistics summary for usage reporting is also triggered at midnight and there is currently no option to change the trigger time. This may affect the process of creating partitions for some tables because the tables might be locked. If there are frequent failures during execution of the partition creating service change the trigger time. Check the following article for a known issue with file tracking replication in SecureTransport versions prior to 5.5-20250731: KB 191320.


Partition.DaysToPrebuild=7
PartitionManagement.Create.triggerTime=00:00
OR
PartitionManagement.Create.triggerTime=01:10




PeSIT enhancements

PeSIT IDs - SecureTransport represents an entity of a PeSIT partner by the combination of an Account and a Transfer Site. A new functionality introduced in SecureTransport version 5.5-20211216 allows to eliminate the need for both parties to use unique names in their configurations. When enabled via the server configuration parameter Pesit.UsePesitIds the PeSIT partnership is formed based on the PeSIT ID properties specified in the account and the Transfer Site settings. PeSIT ID is not a mandatory field, and if left empty, SecureTransport defaults to using the name property. The default value is false.

CFT Extensions - As of SecureTransport version 5.5-20230330, SecureTransport complies with PeSIT CFT extensions and can handle the PI 99 usage. A server configuration option Pesit.CftExtensions.Enabled enables the usage of PeSIT extensions. The default value is true.

Connection Pool Max Wait Time - For server-initiated PeSIT outbound transfers via Transfer Sites with limitation for simultaneous transfers SecureTransport may experience issues of finding available connections from the connection pool. The connection pool is created for destination server host and port once it is used and destroyed if no new files are available to push. You may have multiple transfer sites to the same destination server, but with different limits. If two Transfer Sites with different limits start transfering simultaneously, the first one will create the connection pool and all transfers will use the limit from this Transfer Site. The server configuration parameter ConnectionPool.maxWaitTime sets the timeout for how long ST may attempt to get the connection from the pool before giving up. The default value is 86400 seconds (24 hours). It is recommended to reduce the timeout to a feasible value like 2 minutes.


Pesit.UsePesitIds=true
Pesit.CftExtensions.Enabled=true
ConnectionPool.maxWaitTime=120




Optimize the account initialization process

For client-initiated transfers during the user login process SecureTransport checks all configured subscription folders for their presence and permissions. In the occasion of large number of subscription folders and frequent logins this introduces a significant slowdown for the login and if the filesystem is overwhelmed, the slowdown is even worse. As of SecureTransport version 5.5-20220331, SecureTransport provides a monitoring service for ST accounts (non-template accounts). After a successful account initialization on first login, the account is registered to be monitored (along with the subscription folders) to the service. A info message appears in server logs Account with ID 'XXXXXX' is successfully registered for monitoring by the Directory Structure service. Subsequent logins will ignore the account init if there are no changes in the subscription folders. When a subscription folder is deleted (regardless of the source of the deletion), the account will be removed from the service, and next login will initialize the account and will create/check all subscription folders. The default value is true.


DirectoryStructureServiceEnabled=true




Retry cycles for Basic Application, Advanced Routing - Pull From Partner step, and REST API calls

When a server-initiated transfer fails, SecureTransport can automatically retry the transfer. The time in seconds that SecureTransport waits after a transfer fails before retrying it is calculated by the formula Retry number * EventQueue.retryDelayInterval. After the retry count reaches the value of EventQueue.maxRetryCount the retries cycle stops and ST fails the transfer permanently. With default values of 5 retries and 120 seconds delay interval the retries cycle takes 30 minutes to complete. When the maximum simultaneous connections to a remote host is reached a transfer is retied internally with no limit of retries and the wait time is calculated by the formula Retry number * EventQueue.internalRetryDelayInterval.


The default values


EventQueue.internalRetryDelayInterval=120
EventQueue.maxRetryCount=5
EventQueue.retryDelayInterval=120


Values to match the default Advanced Routing retries and for intensive REST API usage.


EventQueue.internalRetryDelayInterval=2
EventQueue.maxRetryCount=5
EventQueue.retryDelayInterval=2




STFS attribute files and caching

During transfer processing, SecureTransport uses the so-called "stfs" files to store metadata attributes for each transferred file. These are serialized files located in a hidden directory under the user’s home: ~/.stfs/attrs/.


For every uploaded file, a corresponding metadata file is created: ~/.stfs/attrs/<filename>. When a file is moved or deleted, SecureTransport also updates or removes the associated metadata file. The stfs files include both transfer-specific and context-related data, such as Repository Encryption information, decrypted file size (for Repository Encrypted files), transfer status and TransferStatusId, CoreId, startTime, and more.


In some scenarios, it stores additional attributes that can be related to the file, such as the Flow Attributes and PeSIT context attributes. This metadata is read and written multiple times during processing and is critical for SecureTransport's core functionality. These operations may impact performance, especially when there is latency to the shared file system.


A caching mechanism for these STFS attributes is added in SecureTransport version 5.5-20250424, greatly improving the performance for scenarios with large numbers of small files, reducing the filesystem load. This feature is enabled by default meaning that SecureTransport reads the attributes from memory instead of the filesystem.


Stfs.attributes.coherence.cache.enabled=true




Repository encryption

Repository encryption increases SecureTransport's security by avoiding storing unencrypted files to shared storage. When repository encryption is enabled SecureTransport encrypts each file that it pulls from a partner site or that a client pushes to ST. When SecureTransport pushes a file to a partner site or a client pulls a file from ST, SecureTransport decrypts the file. SecureTransport encrypts and decrypts each file dynamically in memory as it receives and sends it, so the files never exist unencrypted in the storage of the host system.


If repository encryption is enabled it is recommended to set server configuration parameter TM.preferBouncyCastleProvider to false. For details refer to the BouncyCastle Security Provider section below.


When Repository Encryption is disabled (after it was enabled before), all previously encrypted files will not be decrypted and transfer will fail. To ensure they can be decrypted and processed, set server configuration parameter DecryptOnDisabledEncryption to true.


If you enable repository encryption, the following SecureTransport functions are not supported: resume PeSIT transfers and pause and resume transfers when SecureTransport is the server.


Stfs.Encryption.CertAlias - Setting this value will enable the repository encryption. Use any certificate alias from the Local Certificates store. Leaving it empty disables repository encryption.

Stfs.Encryption.ListDecryptedSize - Determines which file size, the original file size or the encrypted one, to be reported for repository encrypted files when performing directory or file listing. When set to false, the encrypted file size is reported. When set to true, the original, unencrypted file size (taken from the STFS metadata) is reported. In this case performance degradation is observed when listing directories with lots of files. If it is not required to read the actual file size keep the value of this parameter to false.

Stfs.Encryption.ReadBufferSize - Specifies the buffer size for read operations when Repository Encryption is enabled. Affects download speed when Repository Encryption is enabled. The default value is 32768. Larger buffer may not bring improvement but uses more physical memory - RAM. The optimal value depends on the underlying hardware and other configurations. The value must be less than or equal to TransactionManager.fileIOBufferSizeInKB. Use default value of 32K as minimum starting point for optimization. Refer to the Disk I/O section for information how to optimize this buffer.

Stfs.Encryption.WriteBufferSize - Specifies the buffer size for write operations when Repository Encryption is enabled. Affects upload speed when Repository Encryption is enabled. The default value is 32768. Larger buffer may not bring improvement but uses more physical memory - RAM. The optimal value depends on the underlying hardware and other configurations. The value must be less than or equal to TransactionManager.fileIOBufferSizeInKB. Use default default value of 32K as minimum starting point for optimization. Refer to the Disk I/O section for information how to optimize this buffer.

Stfs.Hash.HashOnUpload - Controls the on-the-fly (dynamically as the file is uploaded) hashing (computing the MD5 checksum) of all incoming transfers. When the value is false, SecureTransport computes the MD5 checksum after the file transfer has completed. This is applicable on both scenarios when repository encryption is enabled and when repository encryption is disabled. To minimize the delay on finalizing upload of large files set the value to true.


Stfs.Encryption.CertAlias= (leave empty)
Stfs.Encryption.ListDecryptedSize=false
Stfs.Encryption.ReadBufferSize=32768
Stfs.Encryption.WriteBufferSize=32768
DecryptOnDisabledEncryption=true
Stfs.Hash.HashOnUpload=true




File Archiving

The File Archiving feature enables the archiving and retrieval of files for resubmit purpose at the global, business unit, and account levels. As of SecureTransport version 5.5-20241031 the Rotate archive folder option allow storing archived files in timestamped subfolders which will be automatically created daily or hourly, depending on the retention period. Consider enabling this option to reduce the archive maintenance time if you expect a high volume of files. See below Maintenance Applications Tuning.


The file archive folder and user home folders should reside on a separate storage devices. There is a negative performance impact when the archive folder is on the same storage device as user home folders due to writing data twice on the same storage device.




Audit Log

The Audit Log contains entries that SecureTransport records when any change is made to the SecureTransport configuration. Audit logging is enabled by default except for changes made by the Transaction Manager. Audit log records can capture details for the object after modification in a text form called collection. Depending on the object type and size this collection could be huge (a few megabytes) and its generation consumes time and resources. The collection can be compared with the collections from the previous records for the same object to identify the exact changes. If this information is not necessary, it is better to turn off collections by setting the value to false for the server configuration parameter AuditLog.Enabled.CollectionLog.


Accounts import is much slower when audit logging is enabled and there is a separate server configuration parameter AuditLog.Enabled.Import which can disable or enable it. The server log contains some information for accounts import actions, so it is better to turn off audit logging during account import.


Audit Log Maintenance

The Audit Log Maintenance application deletes and exports log entries in chunks with a default size of 1000 entries. If collections are enabled, to avoid out of memory errors, decrease the chunk size using the server configuration parameter AuditLog.ChunkSize introduced in SecureTransport version 5.5-20231130. The parameter accepts values between 1 and 1000. Values outside of this range are considered invalid and result in Server Log warnings upon application execution. If the configuration option has an invalid value, the default will be used. See below Maintenance Applications Tuning.


AuditLog.ChunkSize=10
AuditLog.Enabled.Admin=true
AuditLog.Enabled.TM=false
AuditLog.Enabled.Import=false
AuditLog.Enabled.CollectionLog=false
OR
AuditLog.Enabled.CollectionLog=true




Folder Monitor

Folder Monitor is a TM service scanning a designated folder (and optionally its subfolders) for specific files based on the Transfer Site configuration. Once the Transfer Site is used in a Subscription, SecureTransport starts monitoring the folder at fixed intervals as defined in the server configuration parameter FolderMonitor.pollInterval. The default value is 5 seconds. Depending on the number of monitored locations and the performance of the shared storage FM may not be able to scan all configured folders within the poll interval. Use small steps to increase the poll interval (10, 15, 20, etc.) until an optimal value is found. The Folder Monitor service runs on one server of the cluster. If that server fails, the Folder Monitor service automatically fails over to the server in the cluster with the Transaction Manager that has been running the longest. The Folder Monitor service updates its heartbeat according to the server configuration parameter FolderMonitor.heartbeatInterval and fails over upon expiring FolderMonitor.heartbeatTimeout. The FM picks up only files that have not been modified for FolderMonitor.fileDelayInterval. The process is: 1. renaming the source file by adding suffix FolderMonitor.filePostfix; 2. copying the file to the subscription folder; and 3. deleting the source file. Then the target subscription is triggered. Use the defaults shown below and change the mentioned parameters only if necessary.


FolderMonitor.enable=true
FolderMonitor.fileDelayInterval=5
FolderMonitor.filePostfix=__@PROCESS@
FolderMonitor.heartbeatInterval=5
FolderMonitor.heartbeatTimeout=60
FolderMonitor.maxCachedSites=10000
FolderMonitor.pollInterval=5




Scheduler

The Scheduler is a TM service firing subscription pulls from a Transfer Site according to a Subscription's schedule. It is similar to the Linux crontab or the Windows Task Manager. The Scheduler manages all scheduled tasks centrally from the oldest member of the cluster. When schedules trigger events for scheduled tasks, one consolidated queue for all events is maintained across the cluster. This queue is shared and replicated across all the servers in the cluster so that they share the load by taking events from the queue one item at a time and performing the actual transfers or other tasks. If the node where the Scheduler is running fails, another node will take over. You can schedule jobs in two ways, either per subscription or per application.


SecureTransport is using the Quartz Scheduler third party library with separate configuration file located at: $FILEDRIVEHOME/conf/scheduler.properties. The default configuration is using 1 thread with 5 connections to the DB and low priority. For environments with lots of scheduled tasks set the below recommended values in the config file. The last parameter org.quartz.jobStore.acquireTriggersWithinLock=true aims to prevent triggering the same job on two cluster nodes.


The Scheduler cannot be used for AS2 Transfer Sites. Before queueing a new task, the server checks if a previous instance of same periodic task is still pending. If there is an instance of the same periodic scheduled task, the new task is not scheduled, and warning message appear in server log The task "<<SubscriptionID>>_subscription_PARTNER-IN" of account with name: "<<ST account>>" with subscription folder: "<<Subscription folder>>" is still in progress. Skipping the next scheduled occurrence of this task.


Scheduler.enable
Cluster.Service.Scheduler.File.configurationFile.path=/conf/scheduler.properties
#
$FILEDRIVEHOME/conf/scheduler.properties (file)
org.quartz.threadPool.threadPriority=5
org.quartz.jobStore.misfireThreshold=120000
org.quartz.dataSource.DS.maxConnections=25
org.quartz.threadPool.threadCount=20
org.quartz.jobStore.acquireTriggersWithinLock=true


From SecureTransport update 5.5-20260326 C3P0 library was migrated to HikariCP. The HikariCP library introducing a shared DataSource architecture and DB connections are set in configuration.xml only. The value for parameter org.quartz.dataSource.DS.maxConnections is no longer used.




Status Checker - Load balancer health checks

A classic approach to load balancing is to continuously to monitor the service's ports for availability. This is not sufficient for ST because a streaming connection from the TM is an additional condition to consider a service healthy. More complex mechanisms for health-checking like login with a user account consume resources and might not be possible in some load balancers. ST 5.5 provides a liveness status check mechanism via a HTTP service executed by monitord. This service is not configured and enabled by default. Choose any available port on the operating system and set it in the server configuration parameter StatusChecker.port. In the example below the chosen port is 5555. Enable the service by setting the server configuration parameter StatusChecker.enabled to true. Restart the monitord service.


Liveness status can be requested individually for each protocol daemon with URL:


http://<Server IP address>:5555/healthCheck?daemon=<daemon>


where <daemon> is one of the following: ADMIN, FTPD, HTTPD, AS2D, SSHD, PESITD, or SOCKS. The expected responses are:


  • 200 (OK) - Indicates that the service is healthy with functional streaming connections
  • 503 (Service Unavailable) - Indicates that the service is NOT healthy. The service is stopped or no streaming connection has been established.


StatusChecker.enabled=true
StatusChecker.heartbeatInterval=20
StatusChecker.port=5555 (just example)


The health check mechanism depends on the entries in the database table componentstatus. If some entries for a node do not match the component the response code is always 503. The value in the column configurationid must be equal to the LocalConfigurationsId from configuration.xml and the column host must be equal to the IP address of the node. If there is a discrepancy delete all problematic entries from the componentstatus table and ST will rebuild them upon restart of the relevant node.


Limitation: The health check only assesses the default listener!




File renaming for client-initiated uploads

SecureTransport features a file locking mechanism that blocks access to files while they are being uploaded or processed. This mechanism prevents partial uploads and conflicts with post-processing actions, but it may create issues with clients that upload files with temporary names and attempt to rename them after completing the transfer, as SecureTransport may still have the file locked. In such cases, rather than failing immediately when a rename command is issued for a locked file, SecureTransport can be configured to wait for a predetermined period for the lock to be released. This wait-and-retry mechanism can be customized using the following server configuration parameters:


RenameLockedFiles - Controls whether SecureTransport allows renaming attempts on locked files (associated with an .m_inproc file). By default, it is set to disabled, causing any rename command issued while the file is locked to fail immediately. When it is set to enabled, SecureTransport will periodically check if the file is still locked. If all retries are exhausted and the file is still locked, the renaming operation will ultimately fail. You can further adjust the check count and interval.

CIT.Upload.RenameAfterUnlocked.RetryCount - Specifies how many times to check if an uploaded file is unlocked before renaming it. This option only works if RenameLockedFiles is enabled. Default value is 10.

CIT.Upload.RenameAfterUnlocked.RetryDelayInterval - Specifies the delay interval, in milliseconds, between checking if an uploaded file is unlocked before renaming it. The total time (including all checks) must not exceed the maximum idle time interval configured in the client used for upload. This option only works if RenameLockedFiles is enabled. Default value is 1000.


RenameLockedFiles=enabled
CIT.Upload.RenameAfterUnlocked.RetryCount=10
CIT.Upload.RenameAfterUnlocked.RetryDelayInterval=1000




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter TM.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set value to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


TM.preferBouncyCastleProvider=true
OR
TM.preferBouncyCastleProvider=false


If the Cluster communication is encrypted it is recommended setting Admin.preferBouncyCastleProvider to the same value chosen for TM.preferBouncyCastleProvider. Note that parameter Admin.preferBouncyCastleProvider is available from SecureTransport version 5.5-20251030 and above.





Graceful Shutdown

Graceful shutdown is a feature that allows you to have a planned Transaction Manager stop without abrupt cancellation of current server-initiated transfers (SITs), post-routing, post-transformation, and post-processing actions, Advanced Routing actions (all routes and their respective route steps). Once the graceful shutdown is initiated, the TM waits for the existing tasks to finish and does not accept new tasks. The maximum time that TM waits before stopping is set in the server configuration parameter TransactionManager.GracefulShutdownTimeout. The default value is 86400 seconds (24 hours). In case there are leftover stuck events TM will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


TransactionManager.GracefulShutdownTimeout=300




Allow Expired Certificates

SecureTransport provides server configuration parameters to control (allow or disallow) the use of expired certificates.


SIT.allowExpiredCertificates - Controls the usage of expired X509 certificates for server-initiated transfers over the FTPS, HTTPS, PeSIT protocols.

SSH.SIT.allowExpiredCertificates - Controls the usage of SSH keys contained in expired X509 certificates for server-initiated transfers over SSH protocol.

SSH.CIT.allowExpiredCertificates - Controls the usage of SSH keys contained in expired X509 certificates for client-initiated transfers over SSH protocol.


SIT.allowExpiredCertificates=true
OR
SIT.allowExpiredCertificates=false
SSH.SIT.allowExpiredCertificates=true
SSH.CIT.allowExpiredCertificates=true




SSH change the permissions of a file

The server configuration parameter Ssh.UpdateFilePermissionsWithChmodCommand determines whether to use a chmod or a umask command to change the permissions of a file. This can be overridden on transfer site level. The file permissions are set after transfer ends with chmod when the value is set to true. The file handler is opened with specified permissions when the value is set to false. File permissions are modified with umask.


Ssh.UpdateFilePermissionsWithChmodCommand=true
OR
Ssh.UpdateFilePermissionsWithChmodCommand=false




DNS Lookups

SecureTransport provides server configuration parameters to control DNS resolution.


Dmz.Edge.proxyDnsResolutionCheck - Domain Name System resolution will be performed on the Edge before the server-initiated transfer takes place when the value is set to true. Requires active streaming between the Core server and the Edge server. This option will apply only if Use the Edge DNS configuration is enabled in the Network Zone configuration.

SIT.ReverseDNSLookups - Controls the DNS Reverse Lookup for server-initiated transfers. То prevent delays due to DNS lookups set the value to false.


Dmz.Edge.proxyDnsResolutionCheck=false
SIT.ReverseDNSLookups=false




SSL Logging

Specifies the log level of the TLS security information log messages for Pluggable Transfer Sites. These are the Generic-HTTP(S), S3, Azure Blob Storage, Azure File Storage, Google Cloud Storage, Google Drive, OneDrive, SharePoint, etc. The default value OFF suppresses printing a message with security information. The value INFO print message with security information on info level. The value DEBUG prints the same message with security information on debug level instead of on info level. The sample DEBUG message in Server Log looks like below:


User with login name "user1", associated with account "user1", had initiated a connection over HTTP-GENERIC. Remote address: www.google.com. Connection security parameters: cipher suite: TLS_AES_128_GCM_SHA256, TLS/SSL protocol: TLSv1.3.


Plugins.TransferSites.SSLLogging.Level=DEBUG





5. FTP Server Tuning


Buffers

DataBufferSize - FTP data connection buffer size. Allocated on every transfer.

ReadBufferSize - FTP read buffer size. Parameter is increased to avoid excessive streaming traffic due to fragmentation.

ReceiveBufferSize - FTP receive buffer size.


Ftp.DataBufferSize=131072
Ftp.ReadBufferSize=131072
Ftp.ReceiveBufferSize=131072




FTPS compliance

Ftp.Ssl.requireCloseNotify - If the FTP client does not send a close_notify message when uploading files to SecureTransport via FTPS, set to false to prevent failing the transfer. If set to false, the server would be susceptible to TLS truncation attacks! The recommended value is true unless absolutely necessary.

Ftp.Ssl.StrictRfc2228 - Controls strict RFC2228 compliance of the FTPD upon reply to the AUTH TLS command from the clients. Recommended value is true.

Ftp.Ssl.StrictRfc2228CertAuth - Controls strict RFC2228 compliance for certificate authentication of the FTPD. Recommended value is false.


Ftp.Ssl.requireCloseNotify=true
Ftp.Ssl.StrictRfc2228=true
Ftp.Ssl.StrictRfc2228CertAuth=false




DNS Lookups

Server.Dnslookups - This parameter controls whether Server DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is false.

Server.ReverseDNSLookups - This parameter controls whether Server reverse DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is off.


Server.Dnslookups=false
Server.ReverseDNSLookups=off




DataTimeout

The number of seconds the server waits to read a block of data from the client, or write a block of data to the client. If not specified, its value is infinity.


Ftp.DataTimeout= (leave empty)




ListenBacklog

Set the size of the sockets backlog.


Ftp.ListenBacklog=1024




LoginFailureDelay

Specifies the time in milliseconds for which the client is delayed to login after invalid login attempt. Increasing the value can slow down brute force attacks or rogue clients.


Ftp.LoginFailureDelay=500




MaxClients

Set maximum number of concurrent connections. 0 means unlimited.


Ftp.MaxClients=500




WorkerThreads.maxThreads

The maximum number of worker threads in the FTP daemon used for the processing of the requests.


Ftp.WorkerThreads.maxThreads=1024




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter Ftp.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


Ftp.preferBouncyCastleProvider=true
OR
Ftp.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, FTPD waits for the timeout period specified in the server configuration parameter Ftpd.GracefulShutdownTimeout before stopping the FTP service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions FTPD will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Graceful shutdown logging interval - The Server Log displays information about the active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with the graceful shutdown, you must stop the Monitor Server.


Ftp.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The FTP daemon can print SSL/TLS security parameters (TLS version and Cipher Suite) about newly successfully established connections when the server configuration parameter SSLLogging.Ftp is set to true. The sample INFO message in Server Log looks like below:


Establishing FTPS connection with host 127.0.0.1, using cipher suite: TLS_AES_256_GCM_SHA384 and TLS/SSL protocol: TLSv1.3.


SSLLogging.Ftp=true




6. HTTP Server Tuning


ThreadPool

ThreadPool MinThreads - HTTP server request thread pool minimum threads. The default value is 32.

ThreadPool MaxThreads - HTTP server request thread pool maximum threads. The default value is 256.

ThreadPool ThreadsIdleTimeMillis - How much time (in milliseconds) a thread from the thread pool should stay idle before it's stopped. The default value is 60000.


Http.ThreadPool.MinThreads=128
Http.ThreadPool.MaxThreads=1024
Http.ThreadPool.ThreadsIdleTimeMillis=60000




Connections

MaxSimultaneousTransfers - Maximum simultaneous transfers per client. The default value is 20.

Connection MaxIdleTime - The maximum Idle time (in milliseconds) for a connection. The default value is 5 minutes.

AcceptQueueSize - The number of connection requests that can be queued up before the operating system starts to send rejections. The default value is 10000.


Http.MaxSimultaneousTransfers=25
Http.Connection.MaxIdleTime=300000
Http.AcceptQueueSize=10000




Request monitor service

Request MinBandwidth - Sets the minimum processing bandwidth for incoming HTTP requests. If an incoming request drops below the specified minimum bandwidth more than a specified number of times (see Http.Monitor.IterationCount), the connection is reset. Possible values: <number of bytes per second> | 0. Default value is 0. If the value is set to 0 - the request monitor service is disabled.

Monitor IterationCount - Sets the maximum number an HTTP request can drop below the specified minimum bandwidth (See Http.Request.MinBandwidth). If a request drops below that threshold, the connection is reset. Default value: 10. Cannot be set to 0. Option is ignored if HTTP request monitor service is disabled.


Http.Request.MinBandwidth=0
Http.Monitor.IterationCount=10




DNS Lookups

Server.Dnslookups - This parameter controls whether Server DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is false.

Server.ReverseDNSLookups - This parameter controls whether Server reverse DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is off.


Server.Dnslookups=false
Server.ReverseDNSLookups=off




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter Http.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.

In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


Http.preferBouncyCastleProvider=true
OR
Http.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, HTTPD waits for the timeout period specified in the server configuration parameter Http.GracefulShutdownTimeout before stopping the HTTPD service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions HTTPD will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.

Graceful shutdown logging interval - The Server Log displays information about the active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


Http.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The HTTP daemon can print SSL/TLS security parameters (TLS version and Cipher Suite) about newly successfully established connections when a server configuration parameter SSLLogging.Http is set to true. The sample INFO message in Server Log looks like below:


Establishing HTTPS connection with host 127.0.0.1, using cipher suite: TLS_AES_256_GCM_SHA384 and TLS/SSL protocol: TLSv1.3.


SSLLogging.Http=true




7. SSH Server Tuning


Note that the SSH protocol for server-initiated transfers has additional tuning parameters in each SSH Transfer Site! Use larger buffers for higher transfer rates over high bandwidth high latency networks. The Sftp Message Block Size can be increased up to 262000 bytes if the server supports it.


max.pta.wait

Specifies how many milliseconds is the maximum wait time that the SSH server won't return response if the file is currently being processed.


Ssh.max.pta.wait=2000




maxChannels

Maximum channels per client. A single SSH connection may contain multiple channels, all run simultaneously over that connection.


Each channel, in turn, represents the processing of a single service. When you invoke a process on the remote host with Net::SSH, a channel is opened for that invocation, and all input and output relevant to that process is sent through that channel. The connection itself simply manages the packets of all of the channels that it has open.


Ssh.maxChannels=30




maxConnections

Maximum allowed connections to SSHD. Configurable in the SSH Settings page.


Ssh.maxConnections=100




DNS Lookups

Server.Dnslookups - This parameter controls whether Server DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is false.

Server.ReverseDNSLookups - This parameter controls whether Server reverse DNS Lookups are enabled. It applies for HTTP, FTP and SSH daemons only. Recommended value is off.


Server.Dnslookups=false
Server.ReverseDNSLookups=off




Diffie-Hellman Group Exchange Key Size


From SecureTransport version 5.5-20240125 to mitigate security vulnerability known as the "Passive SSH Key Compromise." ST have introduced a new server configuration option Ssh.maxDiffieHellmanGroupExchangeKeySize. The default value is 8192. Note that opting for higher security may lead to performance degradation in transfers over SSH. If system performance is of higher importance for a specific setup, set both Ssh.minDiffieHellmanGroupExchangeKeySize and Ssh.maxDiffieHellmanGroupExchangeKeySize to 1024.


Ssh.minDiffieHellmanGroupExchangeKeySize=1024
Ssh.maxDiffieHellmanGroupExchangeKeySize=1024




SSH Ciphers


Secure ssh ciphers aes256-gcm@openssh.com, aes128-gcm@openssh.com, and chacha20-poly1305@openssh.com may significantly limit the throughput in some environments. To get the highest transfer rates use only CTR ciphers.


Ssh.Ciphers=aes128-ctr,aes192-ctr,aes256-ctr




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter Ssh.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


Ssh.preferBouncyCastleProvider=true
OR
Ssh.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, SSHD waits for the timeout period specified in the server configuration parameter Ssh.GracefulShutdownTimeout before stopping the SSHD service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions SSHD will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Graceful shutdown logging interval - The Server Log displays information about active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


Ssh.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The SSH daemon can print negotiated security parameters (KEX, Ciphers and MACs) for newly successfully established connections when the server configuration parameter SSLLogging.Ssh is set to true. The sample INFO messages in Server Log looks like below:


Establishing SSH connection with host 0:0:0:0:0:0:0:1, using the following properties: key exchange: curve25519-sha256 client-server cipher: aes256-gcm@openssh.com, server-client cipher: aes256-gcm@openssh.com, server-client MAC: <implicit>, client-server MAC: <implicit>.


For some ciphers integrity is not provided using a MAC, but it is part of the cipher itself. In such case the negotiated MAC is shown as implicit.


Establishing SSH connection with host 127.0.0.1, using the following properties: key exchange: diffie-hellman-group-exchange-sha256 client-server cipher: aes128-ctr, server-client cipher: aes128-ctr, server-client MAC: hmac-sha2-256, client-server MAC: hmac-sha2-256.


SSLLogging.Ssh=true




8. AS2 Server Tuning


Receiver.maxContentLength

Maximum file sizes for receiving. The default maximum file size is 50 megabytes, 0 for unlimited. Configurable in the AS2 Settings page in the Admin UI.


As2.Receiver.maxContentLength=200




Sender.maxContentLength

Maximum file sizes for sending. The default maximum file size is 50 megabytes, 0 for unlimited. Configurable in the AS2 Settings page in the Admin UI.


As2.Sender.maxContentLength=200




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter As2.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


As2.preferBouncyCastleProvider=true
OR
As2.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, AS2D waits for the timeout period specified in the server configuration parameter As2.GracefulShutdownTimeout before stopping the AS2D service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions AS2D will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Graceful shutdown logging interval - The Server Log displays information about active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


As2.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The AS2 daemon can print SSL/TLS security parameters (TLS version and Cipher Suite) for newly successfully established connections when a server configuration parameter SSLLogging.As2 is set to true. The sample INFO message in Server Log looks like below:


Establishing AS2 SSL connection with host 127.0.0.1, using cipher suite: TLS_AES_256_GCM_SHA384 and TLS/SSL protocol: TLSv1.3.


SSLLogging.As2=true




9. PeSIT Server Tuning


Note that the PeSIT protocol for server-initiated transfers has additional tuning parameters in each PeSIT Transfer Site!


Pesit.ASCII.recordsInfo.bulk.size

When transferring files over PeSIT in ASCII mode, SecureTransport counts the number of characters on each line and stores them in memory. When the transfer is finished, this data is stored on the file system. This parameter limits the number of line counters stored in memory (each counter is 4 bytes) before the data gets flushed to file. Increasing this parameter can improve performance but will increase the memory usage by the TM and the PeSIT daemon. Allowed values are greater or equal to 1024. The default value is 32768.


Pesit.ASCII.recordsInfo.bulk.size=32768




Timeouts

Create and Select Timeout - PeSIT CREATE/SELECT timeout. Configurable in the PeSIT Settings page in the Admin UI. Default value is 300 seconds.

Inactivity Timeout - PeSIT Protocol inactivity timeout. Configurable in the PeSIT Settings page in the Admin UI. Default value is 60 seconds.

Connection Release Timeout - PeSIT Connection release timeout. Configurable in the PeSIT Settings page in the Admin UI. Default value is 60 seconds.


Pesit.CreateSelect.Timeout=300
Pesit.Inactivity.Timeout=60
Pesit.Connection.Release.Timeout=60




Pesit.MaxConnections

PeSIT maximum number of opened connections. The "Maximum Connections Number" parameter determines how many TCP connections can be initiated, regardless of the number of transfers. Configurable in the PeSIT Settings page in the Admin UI.


More information: KB 177257


Pesit.MaxConnections=200




Pesit.MaxSessions

PeSIT maximum number sessions. The "Maximum Sessions Number" parameter determines how many separate PeSIT transfers can be run simultaneously to you. Configurable in the PeSIT Settings page in the Admin UI.


More information: KB 177257


Pesit.MaxSessions=200




Pesit.Server.pTCP.Buffer.Size

PeSIT server pTCP buffer size in bytes - size of the the buffer collecting data from multiple pTCP connections into one. Does not require restart of PeSIT servers when changed. Takes effect for new transfers after a change.


Set extra large value for – larger than file size. 100 MB = 104857600 bytes.


Pesit.Server.pTCP.Buffer.Size=104857600




Pesit.Server.Socket.Buffer.Size

Socket send/receive buffer size in bytes for PeSIT servers. Corresponds to SO_SNDBUF/SO_RCVBUF settings of TCP layer. Requires restart of PeSIT servers when changed.


Set Receive Buffer size to zero to eliminate socket buffering.


Pesit.Server.Socket.Buffer.Size=0




BouncyCastle Security Provider

The default cryptographic provider in SecureTransport is BouncyCastle. This is determined by the server configuration parameter Pesit.preferBouncyCastleProvider, where the default value is true. The BouncyCastle cryptographic library is FIPS-certified and contains more algorithms and cipher suites than the Sun library. For maximum security, we recommend using the default settings.


In a case where you do not need FIPS, you can set the server configuration option for a particular service to false to speed up system performance. By doing so, Sun becomes the preferred provider, and BouncyCastle is used as a fallback. As Sun is not FIPS-compliant, FIPS mode must first be disabled in order to change the preferred provider from BouncyCastle to Sun.


Pesit.preferBouncyCastleProvider=true
OR
Pesit.preferBouncyCastleProvider=false




Graceful Shutdown

Graceful Shutdown is an option to initiate a shutdown of any or all protocol services without abrupt cancellation of the currently ongoing client-initiated transfer (CIT) sessions. Once the graceful shutdown is initiated, PESITD waits for the timeout period specified in the server configuration parameter Pesit.GracefulShutdownTimeout before stopping the PESITD service. Existing CITs are allowed to complete within the specified timeout period. Any new attempts for file operations are rejected. This includes not only file uploads and downloads but also directory listing, deleting or renaming files, as well as deleting or creating directories. The default value is 86400 seconds (24 hours). In case there are leftover fake sessions PESITD will wait for the timer to expire. It is recommended to reduce the timeout to a feasible value like 5 minutes.


Graceful shutdown logging interval - The Server Log displays information about active connections during an initiated graceful shutdown upon intervals specified in server configuration parameter GracefulShutdown.Logging.Interval. The default value is 60 seconds.


Before you proceed with a graceful shutdown, you must stop the Monitor Server.


Pesit.GracefulShutdownTimeout=300
GracefulShutdown.Logging.Interval=60




SSLLogging

The PeSIT daemon can print SSL/TLS security parameters (TLS version and Cipher Suite) for newly successfully established connections when a server configuration parameter SSLLogging.Pesit is set to true. The sample INFO message in Server Log looks like below:


Establishing PeSIT SSL connection with host 127.0.0.1, using cipher suite: TLS_AES_256_GCM_SHA384 and TLS/SSL protocol: TLSv1.3.


SSLLogging.Pesit=true




10. SOCKS Proxy Tuning


Socks.Idle.Timeout

If server-initiated transfers using FTP(S) are passing through the SOCKS5 proxy, increase the value of the Socks.Idle.Timeout server configuration parameter on the SecureTransport Edge from 600000 to 7200000 milliseconds.


Socks.Idle.Timeout=7200000




Server IP (interface)

Specifies the server host for proxy server. The default value is 0.0.0.0. In other words, configure the interface that faces internally (backends) if you have multiple interfaces.


OutboundConnections.Proxy.serverHost




Client IP (interface)

Specifies the source address/hostname for outgoing connections established from the Proxy service. Only useful on systems with more than one address. In other words, configure the interface that faces externally (internet).


OutboundConnections.Proxy.clientHost




11. Maintenance Applications Tuning


Default maintenance applications

Upon new installation the following maintenance applications are enabled to execute at midnight (12:00 am): Audit Log Maintenance, LogEntry Maintenance, Package Retention Maintenance, Sentinel Link Data Maintenance, and Transfer Log Maintenance. The partition creation service is also configured to execute at midnight. Finally, Statistics summary for usage reporting is also triggered at midnight and there is currently no option to change the execution time. All of these work with the database. To reduce the pressure on the database and the shared storage, and to avoid collisions it is better to rearrange the execution times.


The proposed schedule assumes that quiet hours begin after midnight. This is not valid for all environments. Choose the execution times based on analysis of file transfers and client login patterns.


Audit Log Maintenance - The default configuration deletes 6 months old audit log records in chunks from table auditlog. See the Audit Log section in the Transaction Manager Tuning chapter above. Optionally export deleted records to a CSV file is enabled by default. The default schedule is to run every 1st day of the month at 12:00 AM. Change the start time to 12:15 AM or other suitable time during quiet hours.


LogEntry Maintenance - This application maintains Server Logs, by dropping partitions for tables logging_event, logging_event_exception, and logging_event_property. See the Partitions section in the Transaction Manager Tuning chapter above. The default configuration keeps 1 day of server logs. Depending on the configuration, user activities and the load increase the days to keep to 3, 5, 7 (1 week), ... up to 14 (2 weeks). Usually, 5 days are enough for troubleshooting. To keep logs for longer time consider configuring logs to write to two appenders, one in the database and second one to a flat file. Optionally before dropping the partitions ST can export them to a PostgreSQL custom-format archive file enabled by default. The default schedule is to run everyday at 12:00 AM. Change the start time to 12:30 AM or other suitable time during quiet hours.


Package Retention Maintenance - This application deletes expired file packages from Ad Hoc file transfers. Make sure that the PackageRetentionMaintApp rule package from the Transaction Manager settings is enabled. The default schedule is not set. Set the schedule everyday at 12:45 AM or other suitable time during quiet hours.


Sentinel Link Data Maintenance - This application removes all SentinelLinkData table entries to files that do not exist anymore. The table SentinelLinkData is populated only if Send Events to Axway Sentinel or Decision Insight Server is enabled. The default schedule is to run every first Tue of the month at 12:00 am. Change the start time to 01:00 AM or other suitable time during quiet hours. If Ad Hoc file transfers are in use, then change the start time to 04:00 AM.


Transfer Log Maintenance - This application maintain File Tracking, by dropping partitions for tables subtransmissionstatus, transferdata, transferdetails, transferprotocolcommands, and transferresubmitdata. See the Partitions section in the Transaction Manager Tuning chapter above. The default configuration keeps 30 days of File Tracking. Depending on the configuration and the load decrease the days to keep to 14 (2 weeks), 10, 7 (1 week). Usually, 30 days can be handled by ST. Optionally before dropping the partitions ST can export them to a PostgreSQL custom-format archive file, enabled by default. The default schedule is to run everyday at 12:00 AM. Change the start time to 01:30 AM or other suitable time during quiet hours.




Archive Maintenance

The Archive Maintenance application automatically deletes files based on a schedule. See the File Archiving section in the Transaction Manager Tuning chapter above. The default proposed schedule when adding the application is to run everyday at 12:00 AM. Change the start time to 11:00 PM or other suitable time during quiet hours.


Enable Multithreading

When the Archive Maintenance application is to process a large number of files, it can be executed multi-threaded. To enable multithreading, set the number of threads to execute file deletion in the server configuration parameter FileArchiving.DeleteFiles.ProcessingThreads. The default value is 1.


Increasing the number of threads increases the load on the storage on which the application operates. The number of threads should not exceed 16.


Set maximum run time

Occasionally, if the Archive Maintenance application is processing a large number of files, it may not be able to finish until the next scheduled occurrence. In this case, it may be advisable to specify the maximum time (in minutes) that you expect the application to run in the server configuration parameter FileArchiving.DeleteFiles.MaximumProcessingTime. The default value is 0, which means the application continues to run until it completes.


The Archive Maintenance application is not configured in new installations of SecureTransport. The mentioned parameters are applicable only if the application is created and configured.


FileArchiving.DeleteFiles.ProcessingThreads=4
FileArchiving.DeleteFiles.MaximumProcessingTime=0




Accounts Maintenance

The Accounts Maintenance application can disable, delete, or delete and purge accounts based on account inactivity or age. Make sure that the AccountMaintenanceApp rule package from the Transaction Manager settings is enabled. More details for the configuration are available in Administrator Guide -> Account Maintenance application. The default schedule is not set. Set the schedule everyday at 02:00 AM or other suitable time during quiet hours.




Unlicensed Accounts Maintenance

The Unlicensed Accounts Maintenance application deletes unlicensed user accounts that have been inactive for a specified period of time (60 days by default). Make sure that the UnlicensedAccountMaintApp rule package from the Transaction Manager settings is enabled. More details for the configuration are available in Administrator Guide -> Unlicensed Account Maintenance application. The default schedule is not set. Set the schedule everyday at 03:00 AM or other suitable time during quiet hours.




Login Threshold Maintenance

The Login Threshold Maintenance application unlocks accounts locked according to the selected "Lock account after N successful logins" option in the Account settings and sends a report to specified email contacts. Make sure that the LoginThresholdMaintenanceApp rule package from the Transaction Manager settings is enabled. The default schedule is not set. Set the schedule every 30 minutes or other suitable time period.




File Maintenance

The File Maintenance application deletes files from the account home folders based on a specified retention or expiration period. You can schedule the maintenance and configure notifications to be sent to specific recipients before or/and after the deletion of files. Make sure that the FileMaintenanceApp rule package from the Transaction Manager settings is enabled. More details for the configuration are available in Administrator Guide -> File Maintenance application. The default schedule is not set. Set the schedule everyday at 04:00 AM or other suitable time during quiet hours.




Proposed schedule for maintenance applications

Maintenance application Default schedule Suggested schedule Keep data for Comments
Audit Log Maintenance Every 1st day of the month at 12:00 am. Every 1st day of the month at 12:15 am. 6 months Use smaller chuncks if TM goes OOM during execution.
LogEntry Maintenance Everyday at 12:00 am. Everyday at 12:30 am. 5 days Keep data for as minimum as possible.
Package Retention Maintenance No schedule is defined. Everyday at 12:45 am. variable per package Configure if you use ad hoc file transfers.
Sentinel Link Data Maintenance Every first Tue of the month at 12:00 am. Every Sun at 01:00 am.
or
Every Sun at 04:00 am.
existing files In busy environments with lots of transfers may need to schedule more often.
Transfer Log Maintenance Everyday at 12:00 am. Everyday at 01:30 am. 30 days In busy environments with lots of transfers may need to reduce the days to keep data.
Archive Maintenance Everyday at 12:00 am. Everyday at 11:00 pm. 5 days Make sure you use a separate mount point for archives with async mount option.
Accounts Maintenance No schedule is defined. Everyday at 02:00 am. your choice Use on demand.
Unlicensed Accounts Maintenance No schedule is defined. Everyday at 03:00 am. 60 days Keep data counts consecutive inactive days
Login Threshold Maintenance No schedule is defined. Every 30 minute(s). - Use on demand.
File Maintenance No schedule is defined. Everyday at 04:00 am. 30 days
or
5 days
Use on demand.




12. Shared Storage Tuning


The Standard Cluster distributes the load between the servers in milliseconds times. The various stages of file processing for server-initiated transfers can be handled by any node in the cluster at any time. This requires similar fast access to the storage with minimum latency. Synchronization of data on the storage presented to ST servers is critical especially for small files. In addition, ST keeps some metadata in the STFS directory structure as subfolder in each subscription folder and account home folder. This metadata consists of very small files accessed many times for read and write during the file processing. See STFS attribute files and caching for details. So, the shared storage greatly affects the performance of ST and needs special attention.


Network latency and jitter

The most important parameter is network latency. Over the years a practical limit of 10 ms was accepted. This means that network latency above 10 ms causes a huge (noticeable) performance degradation of ST transfers and it is considered as unsupported. In fact, with all features now available in ST 5.5 even 10 ms is huge delay which drastically reduces the capacity of the ST Standard Cluster (and Enterprise Cluster for that matter). The network latency is usually constant in LAN segments, but it could vary in complex virtual networks with multiple paths to destination and when congestion is present. The variation is called network jitter or packet delay variation (RFC 3393), and it is very bad for storage performance and ST clusters in general. Both network latency and jitter can be addressed for large file transfer with appropriate buffering, but that is not the case for small files and single operations like checking if file exists, opening file, writing content, closing file, changing ownership and permissions, etc. The performance as a whole requires the storage to be as close as possible to the ST cluster (same rack, same switch) to have as minimum as possible constant latency. So, if the storage is clustered, a distributed design is not supported.


Access to storage

During the file processing only ST must have exclusive access to the shared storage. Other storage clients can affect storage synchronization and interfere with the file processing directly by performing actions on the files and folders or via security tools (especially antivirus or antimalware software) scanning files and folders. Some vendor storages support multi-protocol sharing. This mode affects the synchronization capabilities of the storage and provides easy access to other systems. For error free operations this mode is unsupported.


Security tools

Any security tools automatically scanning files directly on the shared storage are not supported because they interfere with ST during file processing. These are antivirus or antimalware applications with real time scanning enabled running on ST servers or other systems with access to the shared storage, and the ST accounts' home folders are not excluded from scanning. If you need to scan files arriving via ST, you need to use the ICAP interface with ST - ICAP settings.


This article will cover only the most popular and widely used protocols for access to network shared storage - NFS and CIFS (SMB).




ST on Linux

Linux has native support for the Network File System (NFS) protocol. ST 5.5 supports NFS versions NFSv3 and NFSv4. This is the most widely used protocol with ST.


  • On premise installations usually use some high-end Storage Area Network (SAN) or Network Attached Storage (NAS) devices from popular vendors like NetApp, Dell EMC, IBM, HPE, Veritas, Synology, etc. They are configured as NFS server or rarely as CIFS share. Another approach is to use a Linux machine configured as NFS server. Supported NFS versions are NFS v3.0, v4.0, v4.1, and v4.2
  • In Amazon cloud ST supports "Amazon EFS over NFS v4.0 and v4.1" and "Amazon FSx for OpenZFS over NFS v3.0 and v4.0".
  • In Azure cloud ST supports "Azure NetApp Files (ANF) over NFS v3.0".
  • In Google cloud ST supports "Google Filestore over NFS v3.0".


The NFS server must export the file system with the sync and no_wdelay options. It is also recommended to add mount option no_subtree_check.


NFS client mount options

There are two working modes (sync and async). Originally it was required to use the sync mount option which means any system call that writes data to files on that mount point causes that data to be flushed to the server before the system call returns control to the user space. This provides greater data cache coherence among clients, but at performance cost. The performance cost could be significant when uploading large files in cloud environments. Recently Axway validated a combination of mount options to work in async mode (using Linux cache). This is possible thanks to the mount option lookupcache=positive. The best approach is to try async mode first and if not working satisfactory, to fall back to sync mode.


async mode (generic)


async,actimeo=3,lookupcache=positive,nfsvers=<VERSION>,rsize=<NUM>,wsize=<NUM>,hard,timeo=600,retrans=2


sync mode (generic)


sync,actimeo=1,nfsvers=<VERSION>,rsize=<NUM>,wsize=<NUM>,hard,timeo=600,retrans=2


Amazon EFS


async,actimeo=3,lookupcache=positive,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport


Amazon FSx for OpenZFS


async,actimeo=3,lookupcache=positive,nfsvers=3,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev


Azure NetApp Files, NetApp ONTAP 9.8 (NFS in NetApp ONTAP)


nconnect=8,async,actimeo=3,lookupcache=positive,nfsvers=3,rsize=262144,wsize=262144,hard,timeo=600,retrans=2


Google Filestore (Filestore instance performance)


nconnect=2,async,actimeo=3,lookupcache=positive,nfsvers=3,rsize=524288,wsize=524288,hard,timeo=600,retrans=3,resvport


For High scale SSD


nconnect=7,async,actimeo=3,lookupcache=positive,nfsvers=3,rsize=524288,wsize=524288,hard,timeo=600,retrans=3,resvport


The nconnect mount option requires support from both the NFS server and the NFS client. Linux provides support for nconnect in kernel version 5.3 and higher. The nconnect works with NFSv4.1, NFSv4.2, and NFSv3.


async - The NFS client delays sending application writes to the server and puts the data in the disk cache. In other words, under normal circumstances, data written by an application may not immediately appear on the server that hosts the file. This can trigger retries in ST on another node (another NFS client) because it does not yes see the file when it begins processing it. The retry mechanism is not applicable or implemented for every kind of IO operations performed by ST. Use async mode together with lookupcache=positive to orchestrate the flow of events in such a way as to avoid processing failures and minimize the retries.


Do not use noac or actimeo=0 together with async mode because this can corrupt stfs attribute files.


sync - The NFS client flushes the application writes to the server before the system call returns the control to the user space. In other words, data written by an application is already present on the server that hosts the file, and it is available for access by other NFS clients. Smoother processing but slower for small files. On a high load system with predominantly small files sync mode works better that async mode where the ST application will be executing retry cycles.


actimeo - This option sets the 4 mount options acregmin, acregmax, acdirmin, and acdirmax to the same value. These mount options control the NFS client cache for the filesystem attributes of regular files and directories. The option "actimeo=3" means that the NFS client will cache the attributes for 3 seconds before requesting fresh attribute information from NFS server. The ST application expects consistent information for filesystem attributes at any time on any node. That is why the best option is to have no filesystem attributes cache. Turning off caching with the noac mount option is a killer for the performance in modern virtual environments and it is not advisable. Use as short as possible cache times for sync mode. One second is usually enough for modern storages but, depending on the hardware, you may need to increase it to two or even three seconds. For async mode usually three seconds are fine, but you may need to reduce it to two seconds or even one second, because of its effect on lookupcache=positive. To find the right value create an account in ST with 30 Subscriptions and put a thousand files in the account home folder. Login and logout many times and measure the login times. If there is no significant difference with cache set to one, two or three seconds choose the smallest value. If the login time is better with two seconds, use "actimeo=2" and finally, if login times are better with three seconds, use "actimeo=3".


lookupcache - If pos or positive is specified, the client assumes positive entries are valid until their parent directory's cached attributes expire, but always revalidates negative entires before an application can use them. Always use this mount option with positive value for async mode in combination with the actimeo option described above.


nfsvers - The NFS protocol version number used to contact the server's NFS service. If the server does not support the requested version, the mount request fails. Sometimes there is no choice when only a particular version is supported (see above). When you have a choice consult with the storage vendor if there is a preference. In perfect conditions it does not matter which one will be selected. One version could be better in some environments, while the other version will be better in other, and this can be identified by load test with a desired traffic pattern. Note that NFSv3 usually uses UDP transport which could make a difference.


rsize - The maximum number of bytes in each network READ request that the NFS client can receive when reading data from a file on an NFS server. The actual data payload size of each NFS READ request is equal to or smaller than the rsize setting. The largest read payload supported by the Linux NFS client is 1,048,576 bytes (one megabyte). The client and server negotiate the largest rsize value that they can both support. Usually, the same value is used for both rsize and wsize. Check with your storage vendor for optimal values. In general, the bigger buffer the better throughput when there are no network or storage constraints.


wsize - The maximum number of bytes per network WRITE request that the NFS client can send when writing data to a file on an NFS server. The actual data payload size of each NFS WRITE request is equal to or smaller than the wsize setting. The largest write payload supported by the Linux NFS client is 1,048,576 bytes (one megabyte). The client and server negotiate the largest wsize value that they can both support. Usually, the same value is used for both wsize and rsize. Check with your storage vendor for optimal values. In general, the bigger buffer the better throughput when there are no network or storage constraints.


hard - Determines the recovery behavior of the NFS client after an NFS request times out. With the hard option NFS requests are retried indefinitely. For ST data integrity is more important than NFS client responsiveness. That is why it is not recommended to use a soft mount option, which can cause silent data corruption in certain cases.


timeo - The time in deciseconds (tenths of a second) the NFS client waits for a response before it retries an NFS request. The NFS client over TCP performs linear backoff: After each retransmission, the timeout is increased by timeo up to the maximum of 600 seconds. However, for NFS over UDP, the client uses an adaptive algorithm to estimate an appropriate timeout value for frequently used request types (such as READ and WRITE requests) but uses the timeo setting for infrequently used request types (such as FSINFO requests).


retrans - The number of times the NFS client retries a request before it attempts further recovery action. If the retrans option is not specified, the NFS client tries each request three times. The NFS client generates a "server not responding" message after retrans retries, then attempts further recovery (depending on whether the hard mount option is in effect).


noresvport - Specifies that the NFS client should use a non-privileged source port when communicating with an NFS server for this mount point. Using non-privileged source ports helps increase the maximum number of NFS mount points allowed on a client, but NFS servers must be configured to allow clients to connect via non-privileged source ports. The exact range of privileged source ports that can be chosen is set by a pair of sysctls to avoid choosing a well-known port, such as the port used by SSH. This means the number of source ports available for the NFS client, and therefore the number of socket connections that can be used at the same time, is practically limited to only a few hundred. As described above, the traditional default NFS authentication scheme, known as AUTH_SYS, relies on sending local UID and GID numbers to identify users making NFS requests. An NFS server assumes that if a connection comes from a privileged port, the UID and GID numbers in the NFS requests on this connection have been verified by the client's kernel or some other local authority. This is an easy system to spoof, but on a trusted physical network between trusted hosts, it is entirely adequate. Using non-privileged source ports may compromise server security somewhat, since any user on AUTH_SYS mount points can now pretend to be any other when making NFS requests. Thus, NFS servers do not support this by default. They explicitly allow it, usually via an export option.


nconnect - The purpose of nconnect is to provide multiple TCP connections to NFS server, which can increase performance and throughput. The current limit of client-server connections opened by nconnect is 16.


It's not recommended to use nconnect and sec=krb5* mount options together. Using these options together can cause performance degradation.


_netdev - This is not really an NFS client mount option. This forces systemd to consider the mount unit a network mount and systemd should mount it only after the network is available. Usually, detection works fine and this mount option is not needed. Using this option overrides the detection and specifies that the mount requires network.


CIFS (SMB) client mount options

CIFS, or the Common Internet File System, is a dialect of the Server Message Block (SMB) protocol. The SMB3 protocol is the successor to the CIFS (SMB) protocol and is supported by most Windows servers, Azure (cloud storage), Macs and many other commercial servers and Network Attached Storage appliances as well as by the popular Open Source server Samba.


CIFS is not a separate protocol but rather a specific implementation or version of the SMB protocol. Modern systems and Microsoft itself now recommend against using CIFS (which corresponds to SMB 1.0) in favor of newer, more secure, and better-performing SMB versions, such as SMB 3.0 and above. Modern SMB versions offer significantly more functionality and efficiency, which CIFS lacks.


This section is currently under development. More information about CIFS will be added in the near future.


Check if the storage is running in sync or async mode

The Linux dd command can bypass Linux cache (oflag=dsync) and write data to the storage synchronously. If the two commands below produce the same results, then the NFS client is using a sync mode. For async mode the second command will return a significantly high transfer rate (see example below). The chosen block size 1460 fills one network packet.


dd if=/dev/zero of=<shared storage mount point>/test.small bs=1460 count=1000 oflag=dsync


dd if=/dev/zero of=<shared storage mount point>/test.small bs=1460 count=1000


Example results of the dd command writing to root local filesystem which definitely uses Linux cache.


[root@RHEL84-axwg3 ~]# dd if=/dev/zero of=/root/test.small bs=1460 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
1460000 bytes (1.5 MB, 1.4 MiB) copied, 3.56844 s, 409 kB/s
[root@RHEL84-axwg3 ~]# dd if=/dev/zero of=/root/test.small bs=1460 count=1000
1000+0 records in
1000+0 records out
1460000 bytes (1.5 MB, 1.4 MiB) copied, 0.00243722 s, 599 MB/s




Retries in ST for important IO operations

ST provides a retry mechanism for some important IO operations on a shared storage. The retry mechanism works the same way for all types of retries described below. When retry is triggered, ST calculates a backoff time by multiplying the retry number by retryTime (pauseTime). After the backoff time expires, ST retries the failed operation.


For example, with retryTime=100 the 5-th retry will be executed after 5 * 100 = 500 milliseconds (0.5 seconds). The total time to exhaust the 10 retries and permanently fail the operation will be 1 * 100 + 2 * 100 + ... + 10 * 100 = 5500 milliseconds (5.5 seconds). A common mistake when increasing retries is to double both the number of retries and retryTime. For the provided example, the total time is 1 * 200 + 2 * 200 + ... + 20 * 200 = 42000 milliseconds (42 seconds). This is a too long retrying cycle with long backoff times. A better approach is to increase the number of retries and slightly reduce retryTime or leave retryTime at default. For example with retries=20 and retryTime=90 the total time is 1 * 90 + 2 * 90 + ... + 20 * 90 = 18900 milliseconds (18.9 seconds).


NFS Support

This is the general retry mechanism to get access to existing files. Originally implemented for NAS using NFS in async mode. Effectivly, this retry mechanism is applicable for any kind of shared storage. The retry triggers on exceptions thrown by the Java functions java.io.File.canRead() and java.io.File.exists(). By default, this mechanism is disabled. To enable it with the recommended values, add the below lines to the STStartScriptsConfig script.


TM_JAVA_OPTS="-Dcom.tumbleweed.tm.nfssupport.NFSSupportConfig.enabled=true $TM_JAVA_OPTS" 
TM_JAVA_OPTS="-Dcom.tumbleweed.tm.nfssupport.NFSSupportConfig.retryCount=5 $TM_JAVA_OPTS" 
TM_JAVA_OPTS="-Dcom.tumbleweed.tm.nfssupport.NFSSupportConfig.pauseTime=200 $TM_JAVA_OPTS"


STFS retries

This retry mechanism is applicable for reading and writing operations of STFS attribute files. Originally it was created for read operations only but then it was extended to write operations. It is enabled by default with 10 retries and retryTime of 100 milliseconds. To increase the retry cycle add the below lines to the STStartScriptsConfig script. New values are 20 retries and retryTime 90 milliseconds (retry cycle 18.9 seconds).


TM_JAVA_OPTS="-Dcom.axway.st.server.fs.attributes.read.retries=20 $TM_JAVA_OPTS"
TM_JAVA_OPTS="-Dcom.axway.st.server.fs.attributes.read.retryTime=90 $TM_JAVA_OPTS"


For further increase change the values to 30 retries and retryTime 60 milliseconds (retry cycle 27.9 seconds).


TM_JAVA_OPTS="-Dcom.axway.st.server.fs.attributes.read.retries=30 $TM_JAVA_OPTS"
TM_JAVA_OPTS="-Dcom.axway.st.server.fs.attributes.read.retryTime=60 $TM_JAVA_OPTS"


AR sandbox retries

This retry mechanism triggers upon NoSuchFileException while ST tries to copy a file from a subscription folder to the sandbox for executing a Route. The retry mechanism is enabled by default with 10 retries and retryTime of 100 milliseconds. The calculation of backoff time is slightly different. The formula is: (retry number - 1) * retryTime = backoff time. To increase the retry cycle add the below lines to the STStartScriptsConfig script. New values are 20 retries and retryTime 90 milliseconds (retry cycle 17.1 seconds).


TM_JAVA_OPTS="-Dcom.axway.st.server.fs.ar.file.processing.retries=20 $TM_JAVA_OPTS"
TM_JAVA_OPTS="-Dcom.axway.st.server.fs.ar.file.processing.retryTime=90 $TM_JAVA_OPTS"


For further increase change the values to 30 retries and retryTime 60 milliseconds (retry cycle 26.1 seconds).


TM_JAVA_OPTS="-Dcom.axway.st.server.fs.ar.file.processing.retries=30 $TM_JAVA_OPTS"
TM_JAVA_OPTS="-Dcom.axway.st.server.fs.ar.file.processing.retryTime=60 $TM_JAVA_OPTS"




13. Log4j2 Tuning


The SecureTransport in standard cluster is using Apache Log4j2 to write server logs to local database by default. A custom appender built in ST called STDBAppender is used instead of standard JDBCAppender. The appender connects to DB via localhost interface and populates 3 partitioned tables logging_event, logging_event_exception, and logging_event_property. These tables are not replicated between databases and partitions for current day must exist to write server logs to DB (see Partitions).

There is a fall-back mechanism which writes to a file serverlog-fallback.log when a partition for current the date does not exist or the rate of the logs is too high. The rate of the logs is fine in normal situations (configuration), but if general debugging is enabled or when there is a defect producing very frequent exceptions with a big stack trace log4j2 is unable to handle writing them to DB and uses fall-back log as well.

The old partitions with logs from previous days are dropped by LogEntry Maintenance application on the backend (core) servers and rotate_db script on the frontend (edge) servers. The SecureTransport keeps by default 1 old partition (1 day). Logs in the database are useful for investigation of current issues, because Admin UI correlates server logs with file tracking and provides many filtering options. It is recommended to keep not more that 15 days in server logs in the database.

Before dropping old partition, the SecureTransport exports it via pg_dump utility to PostgreSQL custom-format archive with compression suitable for input into pg_restore. The archive is not in human readable format and requires extra efforts to be accessible for off-line investigation. If the export fails, then the partition is not dropped by ST. Due to these inconveniences it is recommended:


  • Configure log4j2 to write server logs in two places (database and daily rotated flat file)
  • Keep server logs in database from 3 to 7 days
  • Disable export in the LogEntry Maintenance application on backends. Currently not configurable on the edges



To add second location for server logs in a flat file you need to add DailyRollingFileAppender and put the reference to it in the relevant classes. The DailyRollingFileAppender is the only one capable of printing important information SessionID and TransferID needed for tracking individual transfers in Transaction Manager. The SecureTransport documentation (Administrator Guide - Redirect log4j output from the database) describes only the situation to replace the existing STDBAppender for "ServerLog" with DailyRollingFileAppender for debugging purposes without specifying the ability to print SessionID and TransferID. Additionally, sample configuration in the documentation further suggest replacing DailyRollingFileAppender with the RollingFile appender or using non-blocking Async appender before DailyRollingFileAppender. In both cases you will lose the ability to print SessionID and TransferID. The appender DailyRollingFileAppender only rotate daily the log file but cannot compress old logs nor delete them. You can use an OS script for that purpose.


For Transaction Manager you need to add two appenders. In the example below one appender named "ServerLogFile" corresponds to "ServerLog" and second appender named "ARFileAppender" corresponds to "ImprovedRoutingAppender". Add reference to "ServerLogFile" in any class which has a reference to "ServerLog". For Admin and Tools, you need an additional appender "AuditLogAppenderFile" corresponding to "AuditLogAppender", which does not print SessionID and TransferID. Protocol daemons need only one appender "ServerLogFile", which prints only SessionID. Note that SessionID on protocol daemons has different value from the one on the Transaction Manager.


Example for tm-log4j.xml


#------
#Add two appenders "ServerLogFile" and "ARFileAppender" in section <Appenders>.
#------

<DailyRollingFileAppender append="true" datePattern="'.'yyyy-MM-dd" fileName="/opt/Axway/SecureTransport/var/logs/tm.log" name="ServerLogFile" rotateDirectory="/opt/Axway/SecureTransport/var/db/hist/logs/">
    <PatternLayout pattern="%d{ISO8601} [%t] %p %c %equals{%x}{[]}{} SessionID-%X{sessionId} TransferID-%X{transferId} - %m%n%ex"/>
</DailyRollingFileAppender>
        
<DailyRollingFileAppender append="true" datePattern="'.'yyyy-MM-dd" fileName="/opt/Axway/SecureTransport/var/logs/tm_ar.log" name="ARFileAppender" rotateDirectory="/opt/Axway/SecureTransport/var/db/hist/logs/">
    <PatternLayout pattern="%d{ISO8601} [%t] %p %c SessionID-%X{sessionId} TransferID-%X{transferId} - %X{code}: [%X{accountName}] [%X{routeName}]  %m%n%ex"/>
</DailyRollingFileAppender>


#------
#Add reference to "ServerLogFile" in any class which has reference to "ServerLog" in section <Loggers> like in example below.
#------

<Logger name="AS2Server" level="INFO" additivity="false">
    <AppenderRef ref="ServerLog"/>
    <AppenderRef ref="ServerLogFile"/>
</Logger>


#------
#Add reference to "ARFileAppender" in "class com.axway.st.server.route" in section <Loggers>.
#------

<Logger name="com.axway.st.server.route" level="INFO" additivity="false">
    <AppenderRef ref="ImprovedRoutingAppender"/>
    <AppenderRef ref="ARFileAppender"/>
</Logger>
		


Example for admin-log4j.xml


#------
#Add three appenders "ServerLogFile", "ARFileAppender", and "AuditLogAppenderFile" in section <Appenders>.
#------

<DailyRollingFileAppender append="true" datePattern="'.'yyyy-MM-dd" fileName="/opt/Axway/SecureTransport/var/logs/admin/admin.log" name="ServerLogFile" rotateDirectory="/opt/Axway/SecureTransport/var/db/hist/logs/admin/">
    <PatternLayout pattern="%d{ISO8601} [%t] %p %c %equals{%x}{[]}{} SessionID-%X{sessionId} TransferID-%X{transferId} - %m%n%ex"/>
</DailyRollingFileAppender>
        
<DailyRollingFileAppender append="true" datePattern="'.'yyyy-MM-dd" fileName="/opt/Axway/SecureTransport/var/logs/admin/admin_ar.log" name="ARFileAppender" rotateDirectory="/opt/Axway/SecureTransport/var/db/hist/logs/admin/">
    <PatternLayout pattern="%d{ISO8601} [%t] %p %c %equals{%x}{[]}{} SessionID-%X{sessionId} TransferID-%X{transferId} - %X{code}: [%X{accountName}] [%X{routeName}]  %m%n%ex"/>
</DailyRollingFileAppender>

<DailyRollingFileAppender append="true" datePattern="'.'yyyy-MM-dd" fileName="/opt/Axway/SecureTransport/var/logs/admin/audit-menu.log" name="AuditLogAppenderFile" rotateDirectory="/opt/Axway/SecureTransport/var/db/hist/logs/admin/">
    <PatternLayout pattern="%d{ISO8601} [%t] %p %c %equals{%x}{[]}{} - %m%n%ex"/>
</DailyRollingFileAppender>

#------
#Add reference to "ServerLogFile" in any class which has reference to "ServerLog" in section <Loggers> like in example below.
#------

<Logger name="org.apache" level="WARN" additivity="false">
    <AppenderRef ref="ServerLog"/>
    <AppenderRef ref="ServerLogFile"/>
</Logger>


#------
#Add reference to "ARFileAppender" in "class com.axway.st.server.route" in section <Loggers>.
#------

<Logger name="com.axway.st.server.route" level="INFO" additivity="false">
    <AppenderRef ref="ImprovedRoutingAppender"/>
    <AppenderRef ref="ARFileAppender"/>
</Logger>


#------
#Add reference to "AuditLogAppenderFile" in "AUDIT" in section <Loggers>.
#------

<Logger name="AUDIT" level="ALL" additivity="false">
    <AppenderRef ref="AuditLogAppender"/>
    <AppenderRef ref="AuditLogAppenderFile"/>
</Logger>
		


Example for sshd-log4j.xml


#------
#Add appender "ServerLogFile" in section <Appenders>.
#------

<DailyRollingFileAppender append="true" datePattern="'.'yyyy-MM-dd" fileName="/opt/Axway/SecureTransport/var/logs/sshd.log" name="ServerLogFile" rotateDirectory="/opt/Axway/SecureTransport/var/db/hist/logs/">
    <PatternLayout pattern="%d{ISO8601} [%t] %p %c %equals{%x}{[]}{} SessionID-%X{sessionId} - %m%n%ex"/>
</DailyRollingFileAppender>


#------
#Add refference to "ServerLogFile" in any class which has refference to "ServerLog" in section <Loggers> like in example below.
#------

<Logger name="com.maverick" level="ERROR" additivity="false">
    <AppenderRef ref="ServerLog"/>
    <AppenderRef ref="ServerLogFile"/>
</Logger>
		




14. Entropy Tuning


Entropy is broadly defined as a measure of disorder, randomness, or uncertainty within a system. Low system entropy is a common issue where the OS exhausts its pool of random numbers. This causes severe system delays, blocked cryptography, or services hanging while waiting for the entropy pool to fill in. A low entropy issue in a Virtual Machine (VM) happens because virtualized environments lack physical hardware events (like mouse movements or disk seeks) to generate true randomness. Randomness requirements for security systems are defined in RFC 4086.

The SecureTransport application is just a consumer of random numbers provided by the operating system and its operations rely on good entropy. The BouncyCastle Security Provider in SecureTransport constantly measure the entropy and report warning messages for low entropy like below:


Example for low entropy warning in Admin console file $FILEDRIVEHOME/tomcat/admin/logs/catalina.out


Jun 15, 2026 10:43:23 AM org.bouncycastle.crypto.fips.ContinuousTestingEntropySource getEntropy
WARNING: entropy source stuck


Example for low entropy warning in Server Logs for HTTPD component.


Time, Level, Component, Thread, Message, Filename, Class, Method, Line, Account or Login, Stack Trace, Activity, Transferred File, Client Hostname, Edge Hostname, Server Hostname, Node Name, Session ID, Session Start Time, Transfer ID
"06/15/2026 14:54:57.592","WARN","HTTPD","STDataSourceServerLogComponent:connection-adder","entropy source stuck","N/A","org.bouncycastle.crypto.fips.ContinuousTestingEntropySource","getEntropy","'-1","UNKNOWN","UNKNOWN","UNKNOWN","UNKNOWN","UNKNOWN","UNKNOWN","UNKNOWN","10.20.30.40","UNKNOWN","unknown","UNKNOWN"


The warning messages in ST logs "entropy source stuck" are just indication for low entropy at the time it is reported. Sporadic single occurrence of the warning could be safely ignored. Regular appearance of the messages needs further investigation and taking measure especially for production environments. Here you could find just guidelines for potential solutions.


When the entropy is extremely bad it affects the TLS communications and SecureTransport processing which leads to complete stop of ST on the affected server. In SecureTransport server logs various threads / classes reporting errors like below upon getting next random bytes from SecureRandom generator:


java.lang.NoClassDefFoundError: Could not initialize class sun.security.provider.SecureRandom$SeederHolder


Entropy on Linux


The Linux kernel constantly measures entropy and report it in /proc/sys/kernel/random/entropy_avail. The metrics depends on Linux kernel version. Consult with your Linux distribution for the meaning of the value for entropy_avail. In RHEL8 a healthy pool will sit close to 4096. In RHEL9 the pool architecture changed in newer kernels and a healthy pool will sit close to 256.


The easiest and most common way to resolve VM entropy starvation is by installing a userspace entropy daemon haveged, which generates random numbers using CPU execution time jitter. In modern environments usually this is not enough. Sources Linux can use to gain entropy:


  • Hardware RNG - Utilizing CPU extensions like Intel RDRAND or AMD equivalent instructions. Confirm CPU Flags inside RHEL with command lscpu | grep -E "rdrand|rdseed". RHEL9 and RHEL8 automatically activates the rdrand feature at boot to fill the entropy pool and rarely need rng-tools.
  • TPM module - A Trusted Platform Module (TPM) features a built-in, hardware-based True Random Number Generator (TRNG). Note that configuring a software TPM module on the hypervisor without presence of real hardware module does not help to gain entropy.
  • Virtio RNG - A paravirtualized device that safely passes entropy (randomness) from the host machine to a guest Virtual Machine (VM).
  • rng-tools - The standard package for managing hardware RNG or virtio RNG or other sources pass-through in virtual environments and feed available RNG source to /dev/random.
  • Custom Network Fetchers (EGD protocol) - If your infrastructure relies on a centralized Enterprise Random Number Generator (ERNG) appliance that serves random bits over the network (using the EGD/EGD-HS protocol), you can write a simple client script that pulls data from your central server and pipes it directly into the kernel, or you can configure rngd to read from an EGD socket.
  • SwiftRNG - The SwiftRNG is a general-purpose USB device that generates true (hardware) random numbers at a rate of 100 Mbits per second. Check the link SwiftRNG Software Kit for more information.


To summarize: Use any available hardware based RNG device and expose it to the virtual machine. If no such device use client - server approach to fetch random seeds from a remote server. Configure rng-tools to feed available RNG source to /dev/random. If physical hardware access isn't an option use rng-tools to feed Linux internal urandom generator to /dev/random by running: rngd -r /dev/urandom -o /dev/random or use haveged tool. The entropy sources in cloud instances are very dependent on the provider of choice where the instance is running. Check the article Entropy in RHEL based cloud instances for more information.


Entropy on Windows


Unlike Linux, which utilizes an explicit entropy pool that can block system functions if depleted, Windows Server uses Cryptography API: Next Generation (CNG) and a background-seeded PRNG that continuously draws from multiple hardware and system sources. Windows automatically combines several sources (see below) when available to populate and continuously and strengthen its internal randomness. Unfortunately Windows have no metrics to check for available system entropy. Some PowerShell scripts can measure the file or data entropy using the Shannon Entropy formula. Microsoft’s Sysinternals suite includes a command-line tool called sigcheck, which provides detailed file properties including entropy. Sources Windows can use to gain entropy:


  • Hardware RNG - Utilizing CPU extensions like Intel RDRAND or AMD equivalent instructions.
  • TPM module - A Trusted Platform Module (TPM) features a built-in, hardware-based True Random Number Generator (TRNG). Note that configuring a software TPM module on the hypervisor without presence of real hardware module does not help to gain entropy.
  • UEFI RNG - Firmware reads from UEFI random number generation protocols.
  • System Metrics - Pulls from CPU timing measurements (e.g., interrupt timings and clock cycle counters), current system time, Process IDs (PIDs), and thread behaviors.
  • Registry Seed - Maintains a persistent seed file at HKLM\SYSTEM\RNG\Seed to ensure randomness is immediately available during server boots.
  • SwiftRNG - The SwiftRNG is a general-purpose USB device that generates true (hardware) random numbers at a rate of 100 Mbits per second. An entropy-server.exe runs on the Windows Server, interfacing directly with the USB hardware. It exposes the random data via a duplex Named Pipe. Check the article Using entropy-server on Windows for accessing true random bytes generated by SwiftRNG for more information.
  • AlphaRNG - The AlphaRNG is similar to SwiftRNG. A general-purpose USB device that generates true (hardware) random numbers. Check the article Using entropy-server on Windows for accessing true random bytes generated by AlphaRNG for more information.


It sounds like Windows solves the issue, but it has similar issue in virtual environments when it lacks of physical hardware. The entropy shortages, resulting in slow boot times or delayed cryptographic operations. Ensure your virtual machines have access to a virtualized TPM (vTPM) and that the hypervisor exposes hardware-based random number generation to the guest OS. The available entropy on a physical server is divided to the number of running virtual machines. Sometimes you need to reduce the number of running virtual machines to gain enough entropy. Check also the following links: How to Configure ESXi Entropy and Entropy Broker.




Return to table of contents

Created: August 2025 by Evgeni Evangelov -> Updated: June 2026