KB Article #182925
Cassandra error "Scanned over 100001 tombstones" during query 'SELECT * FROM <keyspace>.api_portal_portaltimestamp'
Problem
API Gateway causes Cassandra to create more tombstones than it can handle
Example Error Message:
ERROR [ReadStage-2] 2023-07-12 13:34:06,654 StorageProxy.java:2011 - Scanned over 100001 tombstones during query 'SELECT * FROM <keyspace>.api_portal_portaltimestamp WHERE LIMIT 5000' (last scanned row token was 8837940002149925218 and partion key was (003a86ed-8829-484c-9f56-ff4d4a854cb4)); query aborted
Resolution
Cassandra generates tombstones when data is deleted. In Cassandra, deleted data is not immediately purged from the disk. Instead, Cassandra writes a special value, known as Tombstone, to indicate that data has been deleted. Tombstones prevent deleted data from being returned during reads. Tombstones will be fully removed during the next compaction operation if the tombstone was created outside of the gc_grace_seconds property value, which defaults to 864000 (10 days). This property is configured on each table.
Fix:
- Reduce the gc_grace_period
on the api_portal_portaltimestamp table.
Update the api_portal_portaltimestamp table to 43200 (12 hours). This will allow for tombstones to be removed more frequently during compaction.
In Cassandra, updating the gc_grace_period on a table can be done using the CQLSH tool. For example:
> alter table <keyspace>.api_portal_portaltimestamp with GC_GRACE_SECONDS = 43200;
Note: Usually, decreasing the gc_grace_period on a Cassandra table would require to run the nodetool repair on that table more frequently. However, in this instance only, since the api_portal_portaltimestamp table uses a very short TTL of 10 minutes, increasing the nodetool repair frequency is not required. - Increase the API Manager event poller period.
Update the APIM event poller period to 2000ms (The default is 200ms). While not directly specific to Cassandra tombstones, this will significantly decrease the frequency of queries to the api_portal_portaltimestamp table, which will reduce CPU consumption.
The event poller value can be set by updating the "vapiPollerPeriodMs" value in the Entity Store and redeploying:
- YAML based Entity Store: Server Settings/Portal Config.yaml
- XML based Entity Store: PrimaryStore.xml
Note: Increasing the API Manager event poller period will affect the synchronization time of processing events related to changes to entities. If you are experiencing synchronization issues, consider modifying the "vapiPollerPeriodMs" value appropriately for your environment.