KB Article #182819
Possible stuck events messsages and automatic event recovery
Problem
This article explains what seeing messages like the below ones in the Server Log mean and where they come from.
Possible stuck events with expired heartbeat timeout detected.
and
Stuck heartbeat events recovery process has finished. Total execution time: 0 seconds Status: SUCCESS. Number of recovered events: 1
This behavior can be observed if you are using a SecureTransport version higher or equal to 5.5-20221124
Resolution
The messages are logged when the Server Configuration option EventQueue.Heartbeat.Recovery.Enabled is set to true.
With this option you enable a mechanism introduced in SecureTransport 5.5-20221124 which in combination with 2 more Server Configuration parameters - EventQueue.Heartbeat.Interval and EventQueue.Heartbeat.Timeout - does automatic recovery of stuck events based on how long an event stays in the Event table in the database.
How the mechanism works
EventQueue.Heartbeat.Recovery.Enabled turns the functionality on or off. The default value is false.
EventQueue.Heartbeat.Interval specifies the events' heartbeat update frequency in seconds. With a value of 5 seconds, each event will attempt to update its heartbeat timestamp in the Event table every 5 seconds. This heartbeat timestamp serves as a flag ("I am still being processed at this time"), which ST will read when evaluating whether the event had been in the queue for too long.
EventQueue.Heartbeat.Timeout specifies the number of seconds, above which SecureTransport will consider a particular event as staying in the queue abnormally long. With a value of 60 seconds, if you have an event in the Event table which has not updated its heartbeat timestamp ("I am still being processed") for more than 1 minute, the event will be considered stuck and the event recovery process will be triggered. The messages from the top of this article will be logged in the Server Log.