KB Article #181499

CFT Cluster State Monitoring on MVS

Problem


IBM System Automation


Multi-node CFT

Resolution

To monitor the multi-node instance we suggest the following:
On each LPAR:

  • Monitor the Copilot job. If job stopped => restart Copilot
  • Monitor messages:
    • CFTS10E: FILE COMMUNICATION TASK ERROR
    • CFDM04E:MANDATORY TASK REQUIRED
    • CFDM05E:PGM WILL BE TERMINATED WITH ABEND 2881
    • CFDM06E:PGM NOW TERMINATED WITH ABEND 2881

If one of those messages comes up, cancel the node (STC) that triggers the error message.

The Copilot node manager will automatically restart the canceled node.

  • CFT instance liveness:
    • each uconf:copilot.node_manager.watchperiod, run cftping. If cftping returns a code not equals to 1 on three consecutive attempts, raise an alert.
      The cftping possible return codes are:
      • 0: all node stopped
      • 1: all node started
      • 2: partially started