Simplified ScyllaDB Management: Common Commands for Seamless Execution

MD Jamil Kashem Porosh
12 min readDec 15, 2023

--

ScyllaDB is a powerful distributed NoSQL database that offers high performance and scalability. However, managing ScyllaDB effectively can sometimes be a daunting task. In this blog post, we will explore some common commands that can simplify ScyllaDB management.

How to connect to ScyllaDB using the CQL shell?

To begin interacting with ScyllaDB, you need to establish a connection with it. To connect to ScyllaDB using the CQL shell type the following command in your terminal:

cqlsh [host] [port] -u [username] -p [password]

For example, to connect to a ScyllaDB node at host 192.168.1.100 on port 9042, using the default username ‘cassandra’ and password ‘cassandra,’ simply enter the following command:

cqlsh 192.168.1.100 9042 -u cassandra -p cassandra

Once you are connected to the CQL shell, you will see a prompt like this:

Connected to Test Cluster at 192.168.1.100:9042.
[cqlsh 6.0.0 | ScyllaDB 4.5.0 | CQL spec 3.4.5 | Native protocol v4]
Use HELP for help.
scylla@cqlsh>

Keyspace Management Commands:

To see a list of all keyspaces in ScyllaDB, type the following:

DESCRIBE KEYSPACES;

Creating a Keyspace:

To create a keyspace, we use the CREATE KEYSPACE command, followed by the name of the keyspace and the options we want to specify. For example, the following command creates a keyspace called blog with a replication factor of 3 and a simple strategy:

CREATE KEYSPACE blog WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

The replication option defines how many copies of the data are stored across the cluster.

The class option specifies the replication strategy, which can be either SimpleStrategy or NetworkTopologyStrategy.

  • SimpleStrategy: This means that Cassandra will store the data in a simple and predictable way. It will start from a node and then store the copies of the data in the next nodes in a clockwise order. This is good for clusters that have only one data center, which is a group of nodes that are physically close to each other and have fast network connections.
  • NetworkTopologyStrategy: This means that Cassandra will store the data in a more complex and flexible way. It will take into account the network topology, which is the structure and layout of the data centers and the nodes within them. It will store the copies of the data in different data centers and different racks (subgroups of nodes) to avoid data loss in case of failures. This is good for clusters that have more than one data center, which are geographically distributed and have slower network connections.

Other options that we can specify for a keyspace are:

  • durable_writes: A boolean value that determines whether the commit log is enabled for the keyspace. The default is true, which means that the data is written to the commit log before being acknowledged to the client. Setting this to false can improve performance, but also increase the risk of data loss in case of a failure.
  • compression: A map of compression parameters that control how the data is compressed on disk. The default is {'sstable_compression': 'LZ4Compressor'}, which uses the LZ4 algorithm to compress the data. Other available compressors are SnappyCompressor, DeflateCompressor, and ZstdCompressor.
  • compaction: A map of compaction parameters that control how the data is compacted on disk. The default is {'class': 'SizeTieredCompactionStrategy'}, which compacts the data based on the size of the SSTables (Sorted String Tables). Other available compaction strategies are LeveledCompactionStrategy, TimeWindowCompactionStrategy, and DateTieredCompactionStrategy.

Altering a Keyspace:

To alter a keyspace, we use the ALTER KEYSPACE command, followed by the name of the keyspace and the options we want to change. For example, the following command changes the replication factor of the blog keyspace to 5:

ALTER KEYSPACE blog WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 5};

Note that altering a keyspace does not affect the existing data, but only the future writes. To apply the changes to the existing data, we need to run a nodetool repair command on each node of the cluster.

Dropping a Keyspace:

To drop a keyspace, we use the DROP KEYSPACE command, followed by the name of the keyspace. For example, the following command drops the blog keyspace and all its tables:

DROP KEYSPACE blog;

Note that dropping a keyspace is an irreversible operation, so we should be careful before executing it. Also, dropping a keyspace does not delete the data from disk immediately, but only marks it as obsolete. To reclaim the disk space, we need to run a nodetool cleanup command on each node of the cluster.

Copying Data Between Keyspaces:

There are several scenarios where we might want to copy data from one keyspace to another, such as:

  • Migrating data to a new keyspace with different options or schema
  • Backing up data to a secondary keyspace for disaster recovery
  • Creating a test or development keyspace with a subset of data from a production keyspace

To copy data between keyspaces, we can use the COPY command, which allows us to export and import data from and to a table using a CSV (Comma Separated Values) file. The syntax of the COPY command is as follows:

COPY keyspace_name.table_name [(column_list)] TO 'file_name' [WITH option = value ...];
COPY keyspace_name.table_name [(column_list)] FROM 'file_name' [WITH option = value ...];

The COPY command can be used to copy data from a table to a file, or from a file to a table.

The optional column list specifies which columns to include in the copy operation.

The file name specifies the path of the CSV file to use.

The optional WITH clause specifies some additional options for the copy operation, such as:

  • HEADER: A boolean value that indicates whether the CSV file has a header row with the column names. The default is true for exporting and false for importing.
  • DELIMITER: A character that separates the values in the CSV file. The default is ‘,’ (comma).
  • QUOTE: A character that encloses the values in the CSV file. The default is ‘"’ (double quote).
  • ESCAPE: A character that escapes the delimiter and quote characters in the CSV file. The default is '' (backslash).
  • NULL: A string that represents a null value in the CSV file. The default is ‘’ (empty string).
  • PAGETIMEOUT: An integer that specifies the timeout in milliseconds for fetching a page of data from the table. The default is 10000 (10 seconds).
  • PAGESIZE: An integer that specifies the number of rows to fetch in a page of data from the table. The default is 1000.
  • MAXATTEMPTS: An integer that specifies the maximum number of attempts to retry a failed copy operation. The default is 5.
  • MAXBATCHSIZE: An integer that specifies the maximum number of rows to insert in a batch of data to the table. The default is 20.
  • MAXPARSEERRORS: An integer that specifies the maximum number of parse errors to tolerate before aborting the copy operation. The default is 10.
  • MAXREQUESTS: An integer that specifies the maximum number of concurrent requests to send to the cluster. The default is 6.
  • MINBATCHSIZE: An integer that specifies the minimum number of rows to insert in a batch of data to the table. The default is 2.
  • NUMPROCESSES: An integer that specifies the number of processes to use for parallelizing the copy operation. The default is the number of CPU cores available.
  • RATEFILE: A string that specifies the path of a file to write the copy progress and rate information. The default is None (no file).
  • SKIPROWS: An integer that specifies the number of rows to skip from the beginning of the CSV file. The default is 0.
  • SKIPCOLS: A list of column names to skip from the CSV file. The default is [] (no columns).

For example, the following command copies all the data from the blog.posts table in the blog keyspace to a file called posts.csv:

COPY blog.posts TO 'posts.csv';

The following command copies the data from the posts.csv file to the blog_archive.posts table in the blog_archive keyspace, skipping the first row and the id column:

COPY blog_archive.posts FROM 'posts.csv' WITH HEADER = true AND SKIPROWS = 1 AND SKIPCOLS = ['id'];

Switching Between Keyspaces:

To switch between different keyspaces, we can use the USE command, which sets the current keyspace for the session. The syntax of the USE command is as follows:

USE keyspace_name;

The USE command takes the name of the keyspace as the argument and sets it as the default keyspace for the subsequent commands. For example, the following command sets the current keyspace to blog:

USE blog;

To switch back to the previous keyspace, we can use the USE command again with the name of the previous keyspace. For example, the following command switches back to the blog_archive keyspace:

USE blog_archive;

Table Management Commands:

To view a list of all the tables in a specific keyspace, you can type:

DESCRIBE TABLES IN [keyspace_name];

Creating a Table:

To create a table, we use the CREATE TABLE command, followed by the name of the table and the definition of the columns and the primary key. The syntax of the CREATE TABLE command is as follows:

CREATE TABLE keyspace_name.table_name (
column_name data_type [static] [PRIMARY KEY],
...
);

The CREATE TABLE command can be used to create a table in a specific keyspace, or in the current keyspace if the keyspace name is omitted. The column name and the data type specify the name and the type of the column, respectively. The data type can be one of the following:

  • ascii: A string of ASCII characters
  • bigint: A 64-bit signed integer
  • blob: A binary large object
  • boolean: A true or false value
  • counter: A 64-bit counter that can only be incremented or decremented
  • date: A date without a time component
  • decimal: A variable-precision decimal number
  • double: A 64-bit floating point number
  • float: A 32-bit floating point number
  • inet: An IP address (v4 or v6)
  • int: A 32-bit signed integer
  • smallint: A 16-bit signed integer
  • text: A string of UTF-8 characters
  • time: A time without a date component
  • timestamp: A date and time
  • timeuuid: A type 1 UUID (Universally Unique Identifier) that contains a timestamp
  • tinyint: An 8-bit signed integer
  • uuid: A type 4 UUID that is randomly generated
  • varchar: A string of UTF-8 characters (same as text)
  • varint: A variable-length integer

The optional static keyword indicates that the column is static, which means that it has the same value for all the rows that share the same partition key.

The partition key is the first part of the primary key, which determines how the data is distributed across the nodes. The primary key can be either simple or composite, depending on the number of columns it consists of. The primary key can be specified in two ways:

  • Inline: By adding the PRIMARY KEY keyword after the column name. This is suitable for simple primary keys that have only one column.
  • Compound: By adding a separate PRIMARY KEY clause at the end of the table definition. This is suitable for composite primary keys that have more than one column. The primary key clause can have one or two parts: the partition key and the clustering key. The partition key is enclosed in parentheses and can have one or more columns. The clustering key is the rest of the columns that are not part of the partition key. The clustering key determines the order of the rows within a partition.

Other options that we can specify for a table are:

  • WITH comment: A string that describes the purpose or usage of the table
  • WITH compaction: A map of compaction parameters that control how the data is compacted on disk. The default is {'class': 'SizeTieredCompactionStrategy'}, which compacts the data based on the size of the SSTables (Sorted String Tables). Other available compaction strategies are LeveledCompactionStrategy, TimeWindowCompactionStrategy, and DateTieredCompactionStrategy.
  • WITH compression: A map of compression parameters that control how the data is compressed on disk. The default is {'sstable_compression': 'LZ4Compressor'}, which uses the LZ4 algorithm to compress the data. Other available compressors are SnappyCompressor, DeflateCompressor, and ZstdCompressor.
  • WITH crc_check_chance: A double value between 0 and 1 that specifies the probability of checking the CRC (Cyclic Redundancy Check) of the data on read. The default is 1, which means that the CRC is always checked. A lower value can improve performance, but also increase the risk of data corruption.
  • WITH dclocal_read_repair_chance: A double value between 0 and 1 that specifies the probability of performing a read repair on the local data center. The default is 0.1, which means that 10% of the reads will trigger a read repair on the local data center. A read repair is a mechanism that ensures the consistency of the data by comparing the replicas and repairing any discrepancies.
  • WITH gc_grace_seconds: An integer that specifies the time in seconds that a deleted row is kept before being removed by the garbage collector. The default is 864000, which is 10 days. A higher value can prevent data resurrection, which is the reappearance of deleted data due to network delays or node failures. A lower value can reclaim disk space faster, but also increase the risk of data resurrection.
  • WITH read_repair_chance: A double value between 0 and 1 that specifies the probability of performing a read repair on all data centers. The default is 0, which means that no read repair is performed on all data centers. A read repair is a mechanism that ensures the consistency of the data by comparing the replicas and repairing any discrepancies.
  • WITH speculative_retry: A string that specifies the policy for performing a speculative retry on read. A speculative retry is a mechanism that improves the latency of the read by sending another request to a different replica if the first one does not respond within a certain time. The default is ‘99PERCENTILE’, which means that the speculative retry is performed if the first request takes longer than the 99th percentile of the read latency. Other available policies are ‘ALWAYS’, ‘NONE’, or a custom value in milliseconds.

For example, the following command creates a table called posts in the blog keyspace, with the following columns and primary key:

  • id: A UUID that uniquely identifies the post
  • title: A text that contains the title of the post
  • content: A text that contains the content of the post
  • author: A text that contains the name of the author of the post
  • date: A timestamp that contains the date and time of the post
  • tags: A set of text that contains the tags of the post

The primary key is composed of the id column as the partition key and the date column as the clustering key. The table also has a comment, a compaction strategy, and a compression option:

CREATE TABLE blog.posts (
id uuid,
title text,
content text,
author text,
date timestamp,
tags set<text>,
PRIMARY KEY (id, date)
) WITH comment = 'A table that stores blog posts'
AND compaction = {'class': 'LeveledCompactionStrategy'}
AND compression = {'sstable_compression': 'SnappyCompressor'};

Deleting a Table:

To delete a table, we use the DROP TABLE command, followed by the name of the table. The syntax of the DROP TABLE command is as follows:

DROP TABLE keyspace_name.table_name;

The DROP TABLE command can be used to delete a table from a specific keyspace, or from the current keyspace if the keyspace name is omitted. For example, the following command deletes the posts table from the blog keyspace:

DROP TABLE blog.posts;

Note that deleting a table is an irreversible operation, so we should be careful before executing it. Also, deleting a table does not delete the data from disk immediately, but only marks it as obsolete. To reclaim the disk space, we need to run a nodetool cleanup command on each node of the cluster.

Editing a Table:

To edit a table, we use the ALTER TABLE command, followed by the name of the table and the changes we want to make. The syntax of the ALTER TABLE command is as follows:

ALTER TABLE keyspace_name.table_name
ADD column_name data_type [static];
ALTER TABLE keyspace_name.table_name
DROP column_name;
ALTER TABLE keyspace_name.table_name
RENAME column_name TO column_name;
ALTER TABLE keyspace_name.table_name
WITH option = value;

The ALTER TABLE command can be used to edit a table in a specific keyspace, or in the current keyspace if the keyspace name is omitted. The changes we can make to a table are:

  • ADD: To add a new column to the table. The column name and the data type specify the name and the type of the column, respectively. The optional static keyword indicates that the column is static, which means that it has the same value for all the rows that share the same partition key. Note that we cannot add a column that is part of the primary key, or a column that has the same name as an existing column.
  • DROP: To drop an existing column from the table. The column name specifies the name

— — — — — — — — — — — —

I welcome your comments and suggestions on this blog post. If you spot any errors, kindly inform me and I will correct them promptly. Thank you for your contribution! 😊

Note::

👋 Hey there! If you have any burning questions or just want to say hi, don’t be shy — I’m only a message away. 💬 You can reach me at jamilkashem@zoho.com.

🤝 By the way, I think we share the same interest in software engineering — let’s connect on LinkedIn! 💻

--

--

MD Jamil Kashem Porosh

Software Engineer | 📝 Tech Blogger | React.js, React Native, JavaScript, Go, Python. (✉️ jamilkashem@zoho.com)