KB Article #180741

CFT HA - copsmng relocking error messages

Problem

Error message:

copsmng[1400] MNODE [0] * relocking !


Mount options as per the installation guide.


Same with both NFS V4.0 or V4.1



Resolution


In multi-node, to show that a copilot is alive on a host, the file copsmng.<hostname>.pid is locked by the copsmng process.


If copilot was stopped or killed locally, the lock on the file would be automatically released by the system.


Then, other hosts would know the copilot is in ERROR (cftutil listnode) when the lock is no more present on the pid file.


Due to NFS lease time, locks may be lost when network connection with the NFS server is bad.


To detect a possible lock loss, the copilot watchdog (copmnwd process) will check that the node manager (copnman process) is alive since it is supposed to send a keepalive every uconf:copilot.node_manager.watchperiod after checking the statuses of the different nodes which requires NFS access and which has been configured as blocking on NFS access (hard option on mount).


When the watchdog does not receive anymore the keepalive from the node manager, it suppose NFS lock may be soon lost and decide to kill all local nodes (which use shared files). As the lock may also be lost by copmsng on copsmng.<hostname>.pid, the watchdog tells copsmng that the file should be locked again, so the message "relocking !" from copsmng.


In fact, all "relocking !" message in copsmng<hostname>.out should match the message "Multinode Watchdog has timeout, all local nodes have been killed" in copmnwd.<hostname>.out