KB Article #183116

Monitoring the Event Queue with SQL queries

Problem

Server initiated transfers in ST are coordinated by events in the Event table in the database. In the past most of the valuable information was hidden in two binary large object fields - data and sessionData - which are difficult to use with standard SQL queries in order to aggregate the events for monitoring purpose.


In SecureTransport 5.5-20220127 a new feature was introduced for events queue management, which brings changes to the Event table to help with the better aggregation of the events. The feature works only via the REST API 2.0 resource /events. In ST 5.5-20231130 the feature was further extended by creating a dedicated page in Admin UI. Originally designed to handle individual events, it has no ordering (sorting) or grouping capabilities yet, but only filter and delete capabilities. This works for the management of events, but is not sufficient for monitoring.


Resolution

For monitoring purposes an SQL query is the best option for now until the Events Queue Management is extended for monitoring. The SQL query has a different syntax for different supported database types (MariaDB, PostgreSQL, Oracle, MSSQL) due to the different way of converting the timestamps to a human readable format, and of extracting the UPLOAD_FOLDER from FULLTARGET. Here are versions of the SQL query for Linux paths.


Note that this article applies only for Server Initiated Transfers (SITs). Client Initiated Transfers (CITs) do not insert any events in the event queue. Any CIT became SIT when file arrive to subscription folder.


MariaDB

SELECT clusternode, count(status) as numberOfEvents, status, eventgroup, retryCount, 
left(fulltarget,char_length(fulltarget)-char_length(substring_index(fulltarget,'/',-1))) as targetFolder, 
Account.name as accountName, 
FROM_UNIXTIME(MIN(arrivalTime)/1000) as oldestArrivalTime, 
Subscription.folder as subscriptionFolder 
FROM Event
LEFT JOIN Account ON accountid=Account.id
LEFT JOIN Subscription ON subscriptionid=Subscription.id
GROUP BY clusternode, status, eventgroup, retryCount, left(fulltarget,char_length(fulltarget)-char_length(substring_index(fulltarget,'/',-1))), Account.name, Subscription.folder
ORDER BY 8 ASC, 2 DESC;


PostgreSQL

SELECT clusternode, count(status) as numberOfEvents, status, eventgroup, retryCount, 
LEFT(fulltarget, LENGTH(fulltarget) - POSITION('/' in REVERSE(fulltarget))) as targetFolder, 
Account.name as accountName, 
to_timestamp(TRUNC(MIN(event.arrivaltime)/1000)) as oldestArrivalTime, 
Subscription.folder as subscriptionFolder 
FROM Event
LEFT JOIN Account ON accountid=Account.id
LEFT JOIN Subscription ON subscriptionid=Subscription.id
GROUP BY clusternode, status, eventgroup, retryCount, LEFT(fulltarget, LENGTH(fulltarget) - POSITION('/' in REVERSE(fulltarget))), Account.name, Subscription.folder
ORDER BY 8 ASC, 2 DESC;


Oracle

SELECT clusternode, count(status) as numberOfEvents, status, eventgroup, retryCount, 
substr(fulltarget,1, instr(fulltarget,'/',-1)) as targetFolder, 
Account.name as accountName, 
TO_CHAR(DATE '1970-01-01' + ( 1 / 24 / 60 / 60 / 1000) * MIN(arrivaltime), 'YYYY-MM-DD HH24:MM:SS') as oldestArrivalTime, 
Subscription.folder as subscriptionFolder 
FROM Event
LEFT JOIN Account ON accountid=Account.id
LEFT JOIN Subscription ON subscriptionid=Subscription.id
GROUP BY clusternode, status, eventgroup, retryCount, substr(fulltarget,1, instr(fulltarget,'/',-1)), Account.name, Subscription.folder
ORDER BY 8 ASC, 2 DESC;


MSSQL

SELECT clusternode, count(status) as numberOfEvents, status, eventgroup, retryCount, 
LEFT(fulltarget, LEN(fulltarget) - CHARINDEX('/', REVERSE(fulltarget), 0)) as targetFolder, 
Account.name as accountName, 
DATEADD(second, cast(MIN(arrivalTime)/1000 as int), convert(datetime, '01/01/1970', 101)) as oldestArrivalTime, 
Subscription.folder as subscriptionFolder 
FROM Event
LEFT JOIN Account ON accountid=Account.id
LEFT JOIN Subscription ON subscriptionid=Subscription.id
GROUP BY clusternode, status, eventgroup, retryCount, LEFT(fulltarget, LEN(fulltarget) - CHARINDEX('/', REVERSE(fulltarget), 0)), Account.name, Subscription.folder
ORDER BY 8 ASC, 2 DESC;


Comments:

The clusternode field is populated on LEC cluster only and contains the IP address of the relevant server.


The status field can have the following values:


0 – Ready. The event is in the queue and waiting for some condition/resources to be executed.

1 – Active. The event has assigned thread, and it is in progress.

2 – Recovering. Upon TM restart events are recovered for reprocessing.


Under normal conditions only active (1) and ready (0) statuses are present.


The eventgroup field represents a subscription and can have a value in the following format:


For SIT pull <subscriptionId>_subscription_PARTNER-IN

For SIT push <subscriptionId>.

REST API events and some PeSIT events do not have eventgroup assigned.


The retryCount field is applicable only for the Basic Application retry mechanism.


The targetFolder represents the full system path where the file resides for pull or push. It could be any subscription folder or subfolder of subscription folder under the account home folder. It is extracted from fulltarget by removing the file name. For Linux the query uses "/" as path delimiter. For Windows this must be changed to "\\".


If you remove the extraction of filename and use the fulltarget field in the SELECT and GROUP BY clauses directly, this effectively removes the grouping and shows individual events.


The oldestArrivalTime is calculated from the time when events were generated and inserted in the Event table. This time never changes until the event is processed and removed from the Event table. The value represents the earliest (oldest) event in the grouped events list.


The SQL query sorts the result by oldestArrivalTime (oldest on top) and then by number of events (higher number of events first).


Sample output of the SQL query




Even better solution

The above SQL query shows all events in the Event table at the time of execution. The Event Queue, however, is very dynamic and events will be created and removed very quickly. When the events stay longer in the queue, they become a subject of interest. The above SQL query can be enhanced to filter only events older than a specific number of seconds (calculated based on the current DB server time). Here are versions of the enhanced SQL query showing events older than a minute (60 seconds).


MariaDB

SELECT e.*, TRUNCATE((UNIX_TIMESTAMP() - e.oldestArrivalUnixTime) / 86400, 0) || ' days ' || DATE_FORMAT('1970-01-01 00:00:00' + INTERVAL UNIX_TIMESTAMP() - e.oldestArrivalUnixTime SECOND, '%T') AS arrivedSince 
FROM (
SELECT clusternode, count(status) as numberOfEvents, status, eventgroup, retryCount, 
left(fulltarget,char_length(fulltarget)-char_length(substring_index(fulltarget,'/',-1))) as targetFolder, 
Account.name as accountName, 
FROM_UNIXTIME(MIN(arrivalTime)/1000) as oldestArrivalTime, 
Subscription.folder as subscriptionFolder, 
MIN(arrivalTime)/1000 AS oldestArrivalUnixTime 
FROM Event
LEFT JOIN Account ON accountid=Account.id
LEFT JOIN Subscription ON subscriptionid=Subscription.id
GROUP BY clusternode, status, eventgroup, retryCount, left(fulltarget,char_length(fulltarget)-char_length(substring_index(fulltarget,'/',-1))), Account.name, Subscription.folder
) e
WHERE (UNIX_TIMESTAMP() - e.oldestArrivalUnixTime) > 60
ORDER BY 8 ASC, 2 DESC;


PostgreSQL

SELECT e.*, TO_CHAR(justify_hours(interval '1 sec' * (CAST(EXTRACT(epoch FROM NOW()) AS INT) - e.oldestArrivalUnixTime)), 'DD" days "HH24:MI:SS') AS arrivedSince 
FROM (
SELECT clusternode, count(status) as numberOfEvents, status, eventgroup, retryCount, 
LEFT(fulltarget, LENGTH(fulltarget) - POSITION('/' in REVERSE(fulltarget))) as targetFolder, 
Account.name as accountName, 
to_timestamp(TRUNC(MIN(arrivaltime)/1000)) as oldestArrivalTime, 
Subscription.folder as subscriptionFolder, 
MIN(arrivalTime)/1000 AS oldestArrivalUnixTime 
FROM Event
LEFT JOIN Account ON accountid=Account.id
LEFT JOIN Subscription ON subscriptionid=Subscription.id
GROUP BY clusternode, status, eventgroup, retryCount, LEFT(fulltarget, LENGTH(fulltarget) - POSITION('/' in REVERSE(fulltarget))), Account.name, Subscription.folder
) e
WHERE (CAST(EXTRACT(epoch FROM NOW()) AS INT) - e.oldestArrivalUnixTime) > 60
ORDER BY 8 ASC, 2 DESC;


Oracle

SELECT e.*, trunc(trunc(sysdate) + numtodsinterval(CAST((CAST(SYS_EXTRACT_UTC(SYSTIMESTAMP) AS DATE) - DATE'1970-01-01') * 86400 AS INTEGER) - e.oldestArrivalUnixTime, 'second')) - trunc(sysdate) || ' days ' || to_char(trunc(sysdate) + numtodsinterval(CAST((CAST(SYS_EXTRACT_UTC(SYSTIMESTAMP) AS DATE) - DATE'1970-01-01') * 86400 AS INTEGER) - e.oldestArrivalUnixTime, 'second'), 'hh24:mi:ss') AS arrivedSince 
FROM (
SELECT clusternode, count(status) as numberOfEvents, status, eventgroup, retryCount, 
substr(fulltarget,1, instr(fulltarget,'/',-1)) as targetFolder, 
Account.name as accountName, 
TO_CHAR(DATE '1970-01-01' + ( 1 / 24 / 60 / 60 / 1000) * MIN(arrivaltime), 'YYYY-MM-DD HH24:MM:SS') as oldestArrivalTime, 
Subscription.folder as subscriptionFolder, 
MIN(arrivaltime / 1000) as oldestArrivalUnixTime 
FROM Event
LEFT JOIN Account ON accountid=Account.id
LEFT JOIN Subscription ON subscriptionid=Subscription.id
GROUP BY clusternode, status, eventgroup, retryCount, substr(fulltarget,1, instr(fulltarget,'/',-1)), Account.name, Subscription.folder
) e
WHERE (CAST((CAST(SYS_EXTRACT_UTC(SYSTIMESTAMP) AS DATE) - DATE'1970-01-01') * 86400 AS INTEGER) - e.oldestArrivalUnixTime) > 60
ORDER BY 8 ASC, 2 DESC;


MSSQL

SELECT e.*, CAST(FLOOR((DATEDIFF(s, '1970-01-01', GETUTCDATE()) - e.oldestArrivalUnixTime) / 86400) AS VARCHAR(10)) + ' days ' + CONVERT(VARCHAR(8), DATEADD(SECOND, (DATEDIFF(s, '1970-01-01', GETUTCDATE()) - e.oldestArrivalUnixTime), '19000101'), 8) AS arrivedSince 
FROM (
SELECT clusternode, count(status) as numberOfEvents, status, eventgroup, retryCount, 
LEFT(fulltarget, LEN(fulltarget) - CHARINDEX('/', REVERSE(fulltarget), 0)) as targetFolder, 
Account.name as accountName, 
DATEADD(second, cast(MIN(arrivalTime)/1000 as int), convert(datetime, '01/01/1970', 101)) as oldestArrivalTime, 
Subscription.folder as subscriptionFolder, 
MIN(arrivalTime)/1000 AS oldestArrivalUnixTime 
FROM Event
LEFT JOIN Account ON accountid=Account.id
LEFT JOIN Subscription ON subscriptionid=Subscription.id
GROUP BY clusternode, status, eventgroup, retryCount, LEFT(fulltarget, LEN(fulltarget) - CHARINDEX('/', REVERSE(fulltarget), 0)), Account.name, Subscription.folder
) e
WHERE (DATEDIFF(s, '1970-01-01', GETUTCDATE()) - e.oldestArrivalUnixTime) > 60
ORDER BY 8 ASC, 2 DESC;


Comments:


Increasing the value 60 in the WHERE clause to something like 3600 (1 hour) helps to filter stuck subscriptions and slow transfers of huge files.


The first line has a complex expression to convert the time difference between the current time and the event's arrival time to a human readable format. You can use the expression from the WHERE clause in its place to get the difference in seconds.


Sample output of the enhanced SQL query




I hope you enjoy using SecureTransport 5.5!

Evgeni Evangelov