KB Article #189822

Transaction differences across usage reporting applications - scenarios and best practices

Problem

There have been situations where differences were observed between various applications used to report number of transactions. The objective of this KB is to clarify possible situations in which such differences can be observed:

  • Between Platform Usage and Business Insights -> API Health (Transaction Metrics)
  • Between Platform Usage/ Business Insights -> API Health and APIM Analytics (or similar apps).

Before looking at the scenarios in details, below are general considerations that should be taken into account:

  1. Platform Usage: Data displayed under Organization -> Usage can be introduced manually or can be sent automatically by Traceability Agent. When using Traceability Agent the data is sent to Platform on a regular basis (by default, daily).
  2. Transaction Metrics: Data displayed under Business Insights -> API Health is sent automatically by Traceability Agent by using a different mechanism than the one used for reporting Usage.
    One can refer to Use Traceability Agent to report gateway usage for more details around Platform Usage and Transaction metrics.
  3. APIM Analytics (or similar): monitors, records, and reports on the history of message traffic between API Gateway instances and various services, remote hosts, and clients running in an API Gateway domain.


Resolution

Scenario 1: Differences between Business Insights & Platform Usage

Such differences can be generated by:

  1. Different reporting schedules
    • CENTRAL_METRICREPORTING_SCHEDULE refers to reporting transaction metrics, default is hourly
    • CENTRAL_USAGEREPORTING_SCHEDULE refers to reporting platform usage, default is daily
    • Platform usage and transaction metrics may not immediately align due to the different reporting intervals mentioned above
  2. Version of Traceability agent
    • Traceability agent versions older than 1.1.81 can lead to inaccuracies in transaction counting in certain scenarios. Please refer to "Minimum supported agent version" in Release Notes
  3. The actual moment when the 2 reports are compared
    • By default, platform usage is reported daily, meaning current day platform usage is reported with a one day delay, whereas transaction metrics are only one hour behind the current time when they are reported
    • It is important to know that platform usage can be sent at any time, but it's always calculated and reflected in the Platform based on UTC midnight. For example, if the customer in a CET timezone sends the platform usage daily, it is sent at 12 AM CET and it is reflected as being received at 11 PM UTC time by the platform.

Scenario 2: Differences between APIM Analytics & Platform Usage/Business Insights

Such differences can be generated by:

  1. Down time of Traceability agent
    • When restarted, Traceability Agent will only process event files created in the last hour (default value for INPUTS_IGNORE_OLDER)
    • If Traceability Agent was stopped for more than one hour then files older than one hour will not be processed, thus resulting in lower values recorded by the Agent when compared with Analytics.
  2. Version of Traceability agent
    • Please refer to the Release Notes: Traceability Agent version should not be lower than the "Minimum supported agent version" while the recommended version of the Traceability Agent is always the latest one available.
  3. Exceptions configured in Traceability Agent (healthchecks)
    • Traceability Agent can be configured to exclude certain transactions when calculating the usage via TRACEABILITY_EXCEPTION_LIST=["/healthcheck"]
    • If excluded by the agent, it is important to take this into account when looking at the transactions counted by APIM Analytics.
  4. Number of event/open traffic files to process
    • In some scenarios it was observed that having tens of thousands of event/open traffic files in the EVENT_LOG_PATHS or OPENTRAFFIC_LOG_PATHS can cause the traceability agent to stop processing transactions. If there is a need to keep backups for longer periods of time then, as a best practice, move older files outside the folders monitored by the Traceability Agent.
  5. Agent name for multiple Traceability Agents in an environment
    • If there are multiple Traceability Agents with the same name set up for variable AGENT_NAME, then this will affect the reports sent by the agents to Platform. Each agent sends a report with the transaction it counted, and the report is based on the name of the agent. If agents share the same name, then the reports will override one another until only one report will be saved in Platform.
    • It is best practice to have different agent names for each Traceability Agent in your environment in order to correctly track reports and usage.
  6. Frequency of writing transactions in the event/open traffic files
    • If the event/open traffic files are not modified for a certain duration then the Traceability Agent will remove the harvester/state of those files as configured via the following 2 parameters:
      • INPUTS_CLOSE_INACTIVE: The agent will close the harvester when the file has not been modified for the duration specified (default value: 10 minutes).
      • INPUTS_CLEAN_INACTIVE: The agent will remove the state of the file when it has not been modified for the duration specified (default value: 2 hours).
    • If (older) event/open traffic files are updated after the period of inactivity has passed then the Agent may end up reprocessing those files thus resulting in a higher number of transactions being recorded (duplicates).
    • This scenario is not expected to be encountered in high traffic environments so the default values for the parameters above are the recommended ones. For other environments one can increase the values for the 2 parameters above or can adjust the rollover frequency of the open traffic/event logs (based both on size and time) by making appropriate changes to `/apigateway/system/conf/loggers/eventLog.yaml`