KB Article #187720

Copy Cassandra Keyspace

Keyspace Copy script

In some situations it might be needed to copy a keyspace from Cassandra to another cluster, keyspace or a combination of the two.

API Gateway backup tool was not designed for this task and KPS Admin would require to have an instance running on the destination environment.

Under some circumstances, such as described in KB Article 183080 (updating API Manager to 7.7.20230530 or later where the Cassandra tables were originally created in version 7.5.2 or earlier), there are some attributes such as COMPACT STORAGE that would need to be removed or changed.

To facilitate and automate the process a script was created to perform the task.

While the script was designed to copy keyspaces between Cassandra clusters or in the same cluster with different names, it also removes compact storage, sets repair chance to 0 (if applicable) and speculative retry to 99 percentile.

It was tested with Cassandra versions 3.11, 4.0 and 4.1 and should also work with 2.2. It might work with newer Cassandra versions too, but at the time of writing 5.0 was still in beta.

Prerequisites

The script leverages DSBulk and cqlsh so these are required for it to work.

dsbulk or the DataStax Bulk Loader can be retrieved from its' github page: https://github.com/datastax/dsbulk
cqlsh is included in the standard Cassandra installation, but may also be installed separately with pip https://pypi.org/project/cqlsh/. It's also available as a standalone package on DataStax website https://downloads.datastax.com/#cqlsh

As an alternative, if docker is installed, the binaries can be run from docker containers leveraging datastax/cassandra-data-migrator and/or cassandra official image as in the commented examples.

The script itself can be found attached to this article.

Usage

To use the script there are some variables which need to be defined:

The source and destination keyspace information, including an IPs or hostnames, credentials etc.
The path to the cqlsh executable and extra arguments for execution. There can be common arguments or arguments only to be used for source or destination clusters. Use this for passing SSL certificates if defined.
The path to the dsbulk executable and extra arguments for execution. As for cqlsh, there are common and specific arguments that can be passed to DSBulk.
Work directory which will be used to store helper scripts that will be generated by this script which will perform the actual operations.
Backup directory where DSBulk will store and read the data.
Optionally, set an action to perform if the destination keyspace already exists. By default the script will prompt the user to decide.

After defining the variables described above simply run the script and it will perform the necessary actions.

How it works

The schema is extracted from the source cluster into a helper file, called src_ks.cql
Then, based on the extracted schema the destination schema is created as dst_ks.cql
Also, based on the extracted schema 2 additional bash helper files are created for backing up (unload) and restoring (load) the tables called, respectively, dsbulk_unload.bash and dsbulk_load.bash.
After the helper scripts are created the execution phase begins with unloading the data from the source keyspace to disk, using the dsbulk_unload.bash script.
Once the data has been unloaded the new keyspace is created in the destination cluster and the data is loaded using the dsbulk_load.bash script.

Keyspace Copy script

Prerequisites

Usage

How it works

Still need help?