KB Article #187720
Copy Cassandra Keyspace
Keyspace Copy script
In some situations it might be needed to copy a keyspace from Cassandra to another cluster, keyspace or a combination of the two.
API Gateway backup tool was not designed for this task and KPS Admin would require to have an instance running on the destination environment.
Under some circumstances, such as described in KB Article 183080 (updating API Manager to 7.7.20230530 or later where the Cassandra tables were originally created in version 7.5.2 or earlier), there are some attributes such as COMPACT STORAGE that would need to be removed or changed.
To facilitate and automate the process a script was created to perform the task.
While the script was designed to copy keyspaces between Cassandra clusters or in the same cluster with different names, it also removes compact storage, sets repair chance to 0 (if applicable) and speculative retry to 99 percentile.
It was tested with Cassandra versions 3.11, 4.0 and 4.1 and should also work with 2.2. It might work with newer Cassandra versions too, but at the time of writing 5.0 was still in beta.
Prerequisites
The script leverages DSBulk and cqlsh so these are required for it to work.
- dsbulk or the DataStax Bulk Loader can be retrieved from its' github page: https://github.com/datastax/dsbulk
- cqlsh is included in the standard Cassandra installation, but may also be installed separately with
piphttps://pypi.org/project/cqlsh/. It's also available as a standalone package on DataStax website https://downloads.datastax.com/#cqlsh
As an alternative, if docker is installed, the binaries can be run from docker containers leveraging datastax/cassandra-data-migrator and/or cassandra official image as in the commented examples.
The script itself can be found attached to this article.
Usage
To use the script there are some variables which need to be defined:
- The source and destination keyspace information, including an IPs or hostnames, credentials etc.
- The path to the
cqlshexecutable and extra arguments for execution. There can be common arguments or arguments only to be used for source or destination clusters. Use this for passing SSL certificates if defined. - The path to the
dsbulkexecutable and extra arguments for execution. As for cqlsh, there are common and specific arguments that can be passed to DSBulk. - Work directory which will be used to store helper scripts that will be generated by this script which will perform the actual operations.
- Backup directory where DSBulk will store and read the data.
- Optionally, set an action to perform if the destination keyspace already exists. By default the script will prompt the user to decide.
After defining the variables described above simply run the script and it will perform the necessary actions.
How it works
- The schema is extracted from the source cluster into a helper file, called
src_ks.cql - Then, based on the extracted schema the destination schema is created as
dst_ks.cql - Also, based on the extracted schema 2 additional bash helper files are created for backing up (unload) and restoring (load) the tables called, respectively,
dsbulk_unload.bashanddsbulk_load.bash. - After the helper scripts are created the execution phase begins with unloading the data from the source keyspace to disk, using the
dsbulk_unload.bashscript. - Once the data has been unloaded the new keyspace is created in the destination cluster and the data is loaded using the
dsbulk_load.bashscript.