KB Article #181769

API Manager returns 503 until cache loaded, how to detect and add to healthchecklb

Problem

  • Calling API Manager hosted API shortly after startup or deploy returns 503 Service Unavailable, and trace log has ERROR line "Service temporarily unavailable".
  • Possibly also using jvm.xml setting com.axway.apimanager.api.data.cache=true.
  • Even if using ZDD settings and a load balancer to route traffic away from nodes under maintenance, this 503 status for API Manager APIs is sometimes seen after /healthchecklb starts returning 200 OK and other (non API Mgr) transactions on the instance are working.

Resolution

[Update for 7.7.Nov2022 and above. The way the cache loads has been redone.

The log line "Ready to accept requests" will no longer show, and the isCacheAvailable() method is no longer available. Cache loading has been optimized. If you do see bothersom temporary 503 Service Unavailable responses to APIs in relation to startup or deployment, please open a support case and reference RDAPI-29030.]



[Original text, applicable for prior versions.]

After an instance finishes starting or a deployment is done, the instance's traffic ports open up and the Health Check LB policy starts returning OK. However, API Manager may not be done loading it's local cache. During that time while the instance ports may be open, but API Manager is still loading, it will return 503 for API Manager APIs.

The trace log line INFO "API-Client-Cache - Ready to accept requests" is indicative of when the loading is complete.


If desired and using ZDD and /healthchecklb as part of your traffic routing. You can modify the Health Check LB policy to check not only the instance status (apigw.maintenance.ongoing) but also if the API manager cache is finished loading.


Scripting filter example (javascript):


function invoke(msg) {
var ACC = com.vordel.apiportal.config.PortalConfiguration.getInstance().getApiClientCache();
msg.put("apiMgr.cache.loaded", (ACC.isCacheAvailable() ? "true" : "false") );
return ACC.isCacheAvailable();
}


Cautions:

  1. For brevity, the example does not include best practices like null checking or try catch block.
  2. The PortalConfiguration class is not officially documented, and thus not "supported". So the current usage of this class to get at the cache status is subject to change by some future product update.
  3. The Health Check LB policy is one of the built in policies, it could be updated / overwritten by some future product update if that policy is enhanced.
  4. The likelihood of 2# happening is low. Having to possibly make this small change again for #3 is low impact.