CubeFS: A crash course

A new distributed filesystem caught my interest recently - CubeFS. If you know anything about me, you'll know I'm a massive distributed storage nerd; with around 2+ petabytes of storage in my lab (RiffLabs Perth) bound together at various times by distributed filesystems such as MooseFS (Pro), SeaweedFS and Ceph.

I've been using and enjoying MooseFS Pro for quite a while now, but I have a capacity-limited licence and over 10x more capacity than I have a licence for, which restricts how much I can actually use my storage to an extent.

What I want out of a distributed filesystem boils down to a few major factors:

  • Storage tiering
  • Reliable high availability
  • Data reliability and resilience
  • Hands off self healing
  • Performance
  • (Bonus) Erasure coding
  • (Bonus) Georeplication

Wait, what's a distributed filesystem?

Good question! A distributed filesystem allows you to combine the storage of one or many machines (usually many, given the word "distributed!") into one filesystem that can be mounted from one or many machines. Usually they expose a POSIX filesystem (traditional storage mount, a la NFS, in other words), but they can also come in the form of a block storage device (such as Ceph RBD) or object storage (S3), or HDFS, or... yeah, there's a lot of variety.

Think of it like taking a bunch of NASes and combining them into one big NAS and having them work together, visible as one unified storage drive.

Enter CubeFS

CubeFS is a fairly new distributed filesystem that is incubating into the CNCF project at the moment, and supports exabyte-scale installations (OPPO has one!).

Incredibly, CubeFS ticks all of these boxes with the exception of the first one - in the fully free and open source version.* (CubeFS is licenced under the Apache-2.0 software licence).

(Currently erasure coding relies on having a Kafka installation - thus, it won't be covered in today's guide, as I don't have one yet - but it works, it works at scale and it even allows for fine customisation of erasure coding schemes. Very cool!)

As a bonus, it supports:

  • Native S3 gateway for object storage
  • Kubernetes persistent storage
  • Volume creation and management
  • Scalable metadata storage that allows for horizontal scaling of metadata without creating very large "master" nodes

What is it missing?

At the moment the main thing I want that CubeFS doesn't have is storage tiering - I want to be able to have SSDs, HDDs and NVMe storage in one cluster, and have data intelligently (or manually) placed onto the various tiers.

I don't necessarily need it to happen automatically - something simpler like MooseFS' storage classes would be fine.

Luckily, the CubeFS team has confirmed that they are working on storage tiering, and it might even be a full implementation with proper multi level caching. This is huge. 😄

The catch

Great, so CubeFS looks really promising, right?

There's just a small catch - there's basically zero security out of the box. It assumes you're running on a trusted network. It's not that they haven't thought of this issue - they have! It's just that the authentication stuff hasn't made it all the way into a release yet, despite being stable and ready for usage. This means if you deploy a CubeFS filesystem normally, following all the directions on the CubeFS homepage, anyone with network access to the master/resource manager nodes can do anything they want to your filesystem, including deleting volumes, reading user keys and various other mischief.

Fixing this is luckily fairly easy. Unfortunately, the documentation for authentication is lacking, with only hints from the old ChubaoFS (the former name for CubeFS) documentation to help you. It took me weeks of pain to figure this one out - luckily, once you know how to do it, it's easy; thus my motivation for writing this guide.

Eventually I plan to turn this entire guide into a fully automated Ansible playbook that will turn the entire process into a 5-10 minute affair. Stay tuned for that.

Integrating authentication and encryption

Simply put, git clone CubeFS, switch to the release-3.3.1 (latest release) branch, and then run git cherry-pick 7f43eaa513a579fad99aaade86bc419e75d89981 and fix the conflicts.

Then build CubeFS as normal, and deploy it to your nodes.

...Too complicated? 😄 Alright, well, fair enough. I did it for you!

GitHub - Zorlin/cubefs at Zorlin/authnode
cloud-native file store. Contribute to Zorlin/cubefs development by creating an account on GitHub.

Here's my fixed version of CubeFS.

This is a very light set of changes, simply integrating the relevant commit. I have made zero other changes, but have also included the steps I took (above) so you can verify this for yourself, which you should do if you plan to use this software anywhere important. I plan to keep my branch up-to-date. I will not be publishing binaries, as I don't believe the supply chain risks are worth it; if I published infected binaries unwittingly, it would be Pretty Bad.

Don't worry, we'll go over everything you need to use the modified CubeFS software and any caveats you'll want to know about. This is a fairly long guide, but it is worth it - you will end up with a fully resilient, self healing, distributed and clustered filesystem with high availability and all kinds of other fun goodies.

Getting started

You'll want to get started with either a single machine per role, or 3 machines per role. For high availability, I recommend using 3x master nodes and 3x authnodes, and then as many metadata and datanode nodes as you want to use.

In my cluster, I will be deploying to the following machines:

# master nodes
cubefs-master01.per.riff.cc
cubefs-master02.per.riff.cc
cubefs-master03.per.riff.cc

# auth nodes
cubefs-authnode01.per.riff.cc
cubefs-authnode02.per.riff.cc
cubefs-authnode03.per.riff.cc

# datanode nodes
monstar.riff.cc
ambellina.riff.cc
al.riff.cc
sizer.riff.cc

# metadata nodes
monstar.riff.cc
ambellina.riff.cc
al.riff.cc
inferno.riff.cc
sizer.riff.cc

A list of nodes to deploy to.

You can, optionally, spin up a node specifically for building the CubeFS software, to avoid having to install an entire build toolchain on one (or more) of your nodes. I'll be skipping this, opting instead to use cubefs-authnode01.per.riff.cc as my buildbox, but in a proper production environment (especially an airgapped one!) this may be more appropriate.

Create your machines

In my case, I'll be using Tinkerbell and tink-flow to deploy my CubeFS nodes, except for the data and metadata nodes which already exist (and are running Proxmox).

I'll be using Debian 12, my standard distribution of choice for a lightweight and stable base, but really any standard-ish Linux distribution will work, from Ubuntu to Rocky Linux and maybe even Alpine Linux.

We won't cover provisioning in this guide, but essentially the process looked like:

  • Create VMs for each of the nodes
  • Collect MAC addresses from each
  • Create a tink-flow manifest with the machines in it
  • Let Tinkerbell deploy the machines with Debian 12

Installing Debian by hand is also a valid option.

You'll want to enable passwordless sudo on your CubeFS machines, at least during setup.

You should also create DNS entries for each of your machines. I'm using the hostnames, then .per.riff.cc (to indicate the Perth RiffLab).

Building and installing CubeFS

We'll be building CubeFS on our first authnode, cubefs-authnode01.

Log onto the node, then install the build dependencies.

First, required packages:

riff@cubefs-authnode01:~$ sudo apt install -y git cmake maven build-essential

Installing the build toolchains and compilers, such as GCC.

Then install Go 1.22 (or whatever the latest stable version of Go is at the time of reading).

root@cubefs-authnode01:~# wget https://go.dev/dl/go1.22.0.linux-amd64.tar.gz
--2024-02-14 04:39:54--  https://go.dev/dl/go1.22.0.linux-amd64.tar.gz
Resolving go.dev (go.dev)... 216.239.32.21, 216.239.38.21, 216.239.36.21, ...
[...]
Saving to: ‘go1.22.0.linux-amd64.tar.gz’
[...]
2024-02-14 04:39:56 (43.6 MB/s) - ‘go1.22.0.linux-amd64.tar.gz’ saved [68988925/68988925]
root@cubefs-authnode01:~# rm -rf /usr/local/go && tar -C /usr/local -xzf go1.22.0.linux-amd64.tar.gz
root@cubefs-authnode01:~# echo "export PATH=$PATH:/usr/local/go/bin" > /etc/profile.d/golang.sh

Installing Go 1.22

Log out and in to ensure the Go toolchain is on your path (or source /etc/profile.d/golang.sh).

Check out my repository, check out the "Zorlin/authnode" branch, then run make to build CubeFS.

git clone https://github.com/Zorlin/cubefs/ && cd cubefs
git checkout Zorlin/authnode
make

Instructions for compiling the custom CubeFS code.

build cfs-server   success
build cfs-authtool success
build cfs-client   success
build cfs-cli      success
build libsdk: libcfs.so       success
build java libcubefs        [INFO] Scanning for projects...
[... lots of maven output ...]
build java libcubefs success
build cfs-fsck      success
build fdstore success
build cfs-preload   success
build cfs-blockcache      success
build blobstore    success

Building CubeFS

Now, throw together a quick tarball with your binaries to make it easier to transfer it to your nodes.

riff@cubefs-authnode01:~/cubefs$ cd ~
riff@cubefs-authnode01:~$ tar czvf ./cubefs-3.1.1-withauth.tar.gz cubefs/build
[... lots of tar output later ...]

Creating a tarball

While you're there, grab the SHA256 hash of the tarball so you can verify its integrity later if needed.

riff@cubefs-authnode01:~$ sha256sum cubefs-3.1.1-withauth.tar.gz 
a0e6c3cd86be4daa9522756f75c910b468dcc6fc5b15d7df24430f6915fe1db9  cubefs-3.1.1-withauth.tar.gz

Getting the hash of our tarball. Your hash will be different as CubeFS builds are not deterministic.

Exit back to your local machine, then log back into your authnode using the -A flag when running ssh (for example ssh riff@cubefs-authnode01.per.riff.cc -A). This will allow you to use your local machine's SSH keys on the remote (authnode01) and thus let you copy your tarball to all of your machines.

riff@cubefs-authnode01:~$ for i in cubefs-authnode{01..03} cubefs-master{01..03} ; do scp cubefs-3.1.1-withauth.tar.gz riff@$i.per.riff.cc:/home/riff/ ; done
cubefs-3.1.1-withauth.tar.gz                                                                                                                100%  280MB 418.3MB/s   00:00    
cubefs-3.1.1-withauth.tar.gz                                                                                                                100%  280MB 366.2MB/s   00:00    
cubefs-3.1.1-withauth.tar.gz                                                                                                                100%  280MB 393.9MB/s   00:00    
cubefs-3.1.1-withauth.tar.gz                                                                                                                100%  280MB 388.6MB/s   00:00    
cubefs-3.1.1-withauth.tar.gz                                                                                                                100%  280MB 425.1MB/s   00:00    
cubefs-3.1.1-withauth.tar.gz   

Using a for loop and scp to copy the tarball to all relevant machines.

Now we'll do a similar trick to allow us to install the CubeFS binaries across all the hosts (Note: If you prefer to use Ansible or something more elegant for these steps, please do! This is just a basic example).

for i in cubefs-authnode{01..03} cubefs-master{01..03} ; do ssh riff@$i.per.riff.cc "tar xvf cubefs-3.1.1-withauth.tar.gz ; sudo cp cubefs/build/bin/cfs-* /usr/local/bin/ && rm -r cubefs"; done

Installing CubeFS, the hacky way.

Finally, we'll use the same sort of thing to create a CubeFS user and group, and create the various directories we'll want CubeFS to use during its operation.

for i in cubefs-authnode{01..03} cubefs-master{01..03} ; do ssh riff@$i.per.riff.cc "sudo adduser --system --home /var/lib/cubefs --group --comment "CubeFS" cubefs ; sudo mkdir -p /var/log/cubefs /etc/cubefs ; sudo chown cubefs:cubefs /var/log/cubefs /etc/cubefs" ; done

Creating the CubeFS user and group and all required directories.

Finally, we're ready to start properly setting up the CubeFS cluster.

Setting up the cluster keys

Setting up our cluster is now relatively easy, but will take a few steps. We will set up the authnode cluster first, then the master cluster, then the datanodes and metadata nodes, (which may be colocated on the same boxes).

Creating the initial keys (PKI) for CubeFS

Before we can get started, we need to generate the root keys for our cluster. The following instructions are based on the ChubaoFS documentation and LOTS of trial and error, as well as the current CubeFS documentation.

Become root with sudo su - and navigate to /etc/cubefs.

riff@cubefs-authnode01:~$ sudo -i
root@cubefs-authnode01:~# cd /etc/cubefs
root@cubefs-authnode01:/etc/cubefs# 

Okay, simple enough. Navigate to the CubeFS directory.

Run cfs-authtool authkey and you should see two files are generated, authroot.json and authservice.json.

root@cubefs-authnode01:/etc/cubefs# cfs-authtool authkey
root@cubefs-authnode01:/etc/cubefs# ls
authroot.json  authservice.json

Generating the root authkeys.

From here, you'll want to create a new file called authnode.json and fill it out. Replace "ip" with your node's IP address, "id" with a unique ID (we suggest the last octet of your node's IP address, as it is entirely arbitrary but must be unique), and "peers" with a list of your authnode's IP addresses. The format for "peers" is "$id:$ipv4:8443" for each host, with commas separating each host.

Change "clusterName" to whatever you like, then set "authServiceKey" and "authRootKey" to the values of the "auth_key" property found in authroot.json and authservice.json These keys are VERY sensitive and are fairly painful to change, so treat them appropriately.

Here is a slightly redacted version of my authnode.json file.

{
  "role": "authnode",
  "ip": "10.0.20.91",
  "port": "8443",
  "prof": "10088",
  "id": "91",
  "peers": "91:10.0.20.91:8443,92:10.0.20.92:8443,93:10.0.20.93:8443",
  "logDir": "/var/log/cubefs/authnode",
  "logLevel": "info",
  "retainLogs": "100",
  "walDir": "/var/lib/cubefs/authnode/raft",
  "storeDir": "/var/lib/cubefs/authnode/rocksdbstore",
  "exporterPort": 9510,
  "clusterName": "rifflabs-per",
  "authServiceKey": "REPLACE-ME-WITH-YOUR-KEY",
  "authRootKey": "REPLACE-ME-WITH-YOUR-KEY",
  "enableHTTPS": true
}

You will want to customise this for your installation.

Now we'll generate a self-signed TLS certificate, valid for each of the authnodes' IP addresses. Adjust the subject and subjectAltName according to your environment.

openssl req \
  -x509 \
  -nodes \
  -newkey ed25519 \
  -keyout server.key \
  -out server.crt \
  -days 3650 \
  -subj "/C=AU/ST=Western Australia/L=Perth/O=riff.cc/OU=Infra/OU=Infra/CN=*" \
  -addext "subjectAltName=DNS:auth.cubefs.per.riff.cc,IP:127.0.0.1,IP:10.0.20.91,IP:10.0.20.92,IP:10.0.20.93"

Generating a self-signed TLS certificate. A future version of this tutorial will involve using real TLS certificates.

Check that the certificate material was generated correctly, then create the folder /app and symlink the material into /app, and finally change /app to be owned by the CubeFS user.

# Check for server.crt and server.key
root@cubefs-authnode01:/etc/cubefs# ls server.*
server.crt  server.key
# Create symlinks in /app to the certificates.
root@cubefs-authnode01:/etc/cubefs# mkdir /app ; ln -s /etc/cubefs/server.crt /app/server.crt ; ln -s /etc/cubefs/server.key /app/server.key ; chown -R cubefs:cubefs /app

We'll now install the systemd unit for cubefs-authnode, allowing us to start and run it as a service.

With your favourite editor, open up a new file at /etc/systemd/system/cubefs-authnode.service and fill it with the following contents:

[Unit]
Description=CubeFS AuthNode Server
After=network.target

[Service]
Type=forking
User=cubefs
Group=cubefs
ExecStart=/usr/local/bin/cfs-server -c /etc/cubefs/authnode.json
LimitNOFILE=102400000

[Install]
WantedBy=multi-user.target

cubefs-authnode.service

Then start the service.

root@cubefs-authnode01:/etc/cubefs# sudo systemctl daemon-reload ; sudo systemctl enable cubefs-authnode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-authnode.service → /etc/systemd/system/cubefs-authnode.service.

Starting cubefs-authnode for the first time.

Congratulations, you've setup your authnode. If you only have one, you can skip to the next heading, otherwise, read on.

If you're using a cluster, you'll need to setup your remaining authnodes. To do so, simply copy /etc/cubefs/authnode.json, /etc/systemd/system/cubefs-authnode.service, /etc/cubefs/server.crt, and /etc/cubefs/server.key to your remaining nodes, and also recreate the symlinks in /app:

sudo mkdir /app ; sudo ln -s /etc/cubefs/server.crt /app/server.crt ; ln -s /etc/cubefs/server.key /app/server.key ; sudo chown -R cubefs:cubefs /app

The authnode looks for the certificates in /app/ whereas every other component looks in /etc/cubefs/, thus this song and dance.

You'll need to adjust the "ip" and "id" fields in authnode.json to match the node that configuration lives on, but other than that all configuration should be identical. You should also make sure the permissions on your TLS key and authnode configuration are correct - cd /etc/cubefs && sudo chmod 600 server.key authnode.json will take care of this. While you're at it, make sure /etc/cubefs/ is owned by cubefs on all of your nodes and that any files with keys in them are also set to chmod 600 (ie: read/write allowed for "cubefs", no permissions for group, no permissions for world).

In case it's helpful, this is what proper permissions should look like:

riff@cubefs-authnode01:/etc/cubefs$ ls -lah
total 28K
drwxr-xr-x  2 cubefs cubefs 4.0K Feb 14 06:03 .
drwxr-xr-x 76 root   root   4.0K Feb 14 05:30 ..
-rw-------  1 cubefs cubefs  567 Feb 14 05:56 authnode.json
-rw-------  1 cubefs cubefs  221 Feb 14 05:37 authroot.json
-rw-------  1 cubefs cubefs  221 Feb 14 05:37 authservice.json
-rw-r--r--  1 cubefs cubefs  843 Feb 14 05:48 server.crt
-rw-------  1 cubefs cubefs  119 Feb 14 05:48 server.key

Proper permissions.

Once your remaining nodes are configured, start them just like you started the first node:

root@cubefs-authnode02:/etc/cubefs# sudo systemctl daemon-reload ; sudo systemctl enable cubefs-authnode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-authnode.service → /etc/systemd/system/cubefs-authnode.service.

root@cubefs-authnode03:/etc/cubefs# sudo systemctl daemon-reload ; sudo systemctl enable cubefs-authnode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-authnode.service → /etc/systemd/system/cubefs-authnode.service.

Starting the remaining nodes.

Checking on the state of your authnode cluster

You should be able to examine the logs on one of your authnodes (the first one is probably best for this) and see it electing a leader.

riff@cubefs-authnode01:/etc/cubefs$ sudo cat /var/log/cubefs/authnode/authNode/authNode_warn.log
2024/02/14 06:11:44.783799 [WARN ] authnode_manager.go:36: action[handleLeaderChange] change leader to [10.0.20.91:8443] 

The leader will be elected as soon as two of the three (or whatever constitutes quorum in your Authnode cluster) nodes are online.

Creating the admin keys and an admin ticket

Create the following files in /etc/cubefs on your first node. These will be used to generate keys with the appropriate permissions:

{
    "id": "admin",
    "role": "service",
    "caps": "{\"API\":[\"*:*:*\"]}"
}

data_admin.json

{
    "id": "cubeclient",
    "role": "client",
    "caps": "{\"API\":[\"*:*:*\"], \"Vol\":[\"*:*:*\"]}"
}

data_client.json - you can customise this ID if you desire.

{
    "id": "DatanodeService",
    "role": "service",
    "caps": "{\"API\":[\"*:*:*\"]}"
}

data_datanode.json

{
    "id": "MasterService",
    "role": "service",
    "caps": "{\"API\":[\"*:*:*\"]}"
}

data_master.json

{
    "id": "MetanodeService",
    "role": "service",
    "caps": "{\"API\":[\"*:*:*\"]}"
}

data_metanode.json

Whew, all done with those.

Use the authservice.json file you generated earlier to create an Authnode ticket.

root@cubefs-authnode01:/etc/cubefs# cfs-authtool ticket -https -host=10.0.20.91:8443 -keyfile=authservice.json -output=ticket_auth.json getticket AuthService

Replace 10.0.20.91:8443 with your host's IP and port.

Now, create an admin key:

root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_auth.json -data=data_admin.json -output=key_admin.json AuthService createkey

Creating an admin key.

Finally, create an admin ticket:

root@cubefs-authnode01:/etc/cubefs# cfs-authtool ticket -https -host=10.0.20.91:8443 -keyfile=key_admin.json -output=ticket_admin.json getticket AuthService

You can now use the admin ticket to create the rest of the necessary keys.

Creating the master/resource manager keys

root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_admin.json -data=data_master.json -output=key_master.json AuthService createkey

As usual, replace the host as needed.

Creating the datanode keys

root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_admin.json -data=data_datanode.json -output=key_datanode.json AuthService createkey

Creating the metadata node keys

root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_admin.json -data=data_metanode.json -output=key_metanode.json AuthService createkey

Creating the client keys

root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_admin.json -data=data_client.json -output=key_client.json AuthService createkey

Creating the master/resource manager nodes

On your master nodes, install the following systemd service file as /etc/systemd/system/cubefs-master.service:

[Unit]
Description=CubeFS Master Server
After=network.target

[Service]
Type=forking
User=cubefs
Group=cubefs
ExecStart=/usr/local/bin/cfs-server -c /etc/cubefs/master.json
LimitNOFILE=102400000

[Install]
WantedBy=multi-user.target

cubefs-master.service

Copy server.crt (but NOT server.key) from your authnode to /etc/cubefs/.

Now, fill out the following and save it as /etc/cubefs/master.json on each of your master nodes.

{
  "role": "master",
  "ip": "10.0.20.101",
  "listen": "17010",
  "prof":"17020",
  "id":"101",
  "peers": "101:10.0.20.101:17010,102:10.0.20.102:17010,103:10.0.20.103:17010",
  "retainLogs":"20000",
  "logDir": "/var/log/cubefs/master/",
  "logLevel":"info",
  "walDir":"/var/lib/cubefs/master/data/wal",
  "storeDir":"/var/lib/cubefs/master/data/store",
  "clusterName":"rifflabs-per",
  "metaNodeReservedMem": "1073741824",
  "masterServiceKey":"a-not-so-long-string-from-auth-key",
  "authenticate": true,
  "authNodeHost": "10.0.20.91:8443,10.0.20.92:8443,10.0.20.93:8443",
  "authNodeEnableHTTPS": true,
  "authNodeCertFile": "/etc/cubefs/server.crt"
}

master.json

You'll want to, as before, replace some of the values.

Specifically, set ip to the IP of each master node, set id to an appropriate unique value for each node (once again, it's probably best to use the same value as the last octet of the node's IP address for simplicity). Set clusterName to the same clusterName that you set before (when setting up the authnodes). Set "peers" in a similar fashion to the authnodes, only this time we're referencing the masters instead.

For masterServiceKey, copy the value from /etc/cubefs/key_master.json on your first Authnode. Here, you must use the value auth_key and not the value from auth_key_id. Confusingly, some components need the former and some need the latter.

Once you've configured all your master nodes, go ahead and start them all.

You should be able to see in the logs that a leader is elected when you have enough nodes for quorum, as before:

==> /var/log/cubefs/master/master/master_critical.log <==
2024/02/14 06:56:39.391449 [FATAL] alarm.go:48: rifflabs_per_master_warning clusterID[rifflabs-per] leader is changed to 10.0.20.101:17010

A leader has been elected.

Configuring cfs-cli

Now for the exciting part - we finally get to test out the basics of our cluster authentication. (Okay, the really exciting part comes later, but this is still cool).

Open /etc/cubefs/key_client.json and take the value auth_id_key and copy it.

Then, fill out the following on your management machine (whatever machine you want to manage CubeFS with - make sure you install the tarball from earlier onto it), replacing the IP addresses with your master cluster's IP addresses. You'll need to set this value based on auth_id_key (NOT auth_id!) from /etc/cubefs/key_client.json from your first Authnode.

{
  "masterAddr": [
    "10.0.20.101:17010",
    "10.0.20.102:17010",
    "10.0.20.103:17010"
  ],
  "timeout": 60,
  "clientIDKey":"a-very-long-string-from-auth-id-key"
}

Make sure you use auth_id_key here, not auth_id, or you'll tear your hair out trying to figure it out.

Let's first try an unprivileged operation:

wings@blackberry:~$ cfs-cli cluster info
[Cluster]
  Cluster name       : rifflabs-per
  Master leader      : 10.0.20.101:17010
  Master-101           : 10.0.20.101:17010
  Master-102           : 10.0.20.102:17010
  Master-103           : 10.0.20.103:17010
  Auto allocate      : Enabled
  MetaNode count     : 0
  MetaNode used      : 0 GB
  MetaNode total     : 0 GB
  DataNode count     : 0
  DataNode used      : 0 GB
  DataNode total     : 0 GB
  Volume count       : 0
  Allow Mp Decomm    : Enabled
  EbsAddr            : 
  LoadFactor         : 0
  BatchCount         : 0
  MarkDeleteRate     : 0
  DeleteWorkerSleepMs: 0
  AutoRepairRate     : 0
  MaxDpCntLimit      : 3000

This tells us that our master cluster is working, and that things seem somewhat healthy.

Now, let's try something that requires a key:

wings@blackberry:~$ cfs-cli user create test
Create a new CubeFS cluster user
  User ID   : test
  Password  : [default]
  Access Key: [auto generate]
  Secret Key: [auto generate]
  Type      : normal

Confirm (yes/no)[yes]: yes
Create user success:
[Summary]
  User ID    : test
  Access Key : REDACTED
  Secret Key : REDACTED
  Type       : normal
  Create Time: 2024-02-14 07:01:43
[Volumes]
VOLUME                  PERMISSION

If you instead see an error like this:

Error: Create user failed: invalid clientIDKey

Womp womp.

check all your key configuration (and in the absolute worst case, blow away your authnode's /var/lib/cubefs/* data and set it up again - I hit a bug at some point where nothing I did would make certain keys be recognised, and creating a fresh Keystore was the only way to fix it. YMMV, and obviously only do that as a drastic measure) and then try again.


🎉 Great job if you've gotten this far! The rest is significantly easier than the previous steps, so you're not got far to go. 😄

Creating the full cluster

Now that we have our base infrastructure, we can add the actual data and metadata nodes that will store everythitar xvf cubefs-3.1.1-withauth.tar.gz ; cp cubefs/build/bin/cfs-* /usr/local/bin/ ; rm -r cubefs/
ng for us. Then we'll mount our new filesystem and store some data!

Creating the data nodes

On each data node:

Install CubeFS using the tarballs, as before, as well as creating the CubeFS user, group and directories (/var/lib/cubefs, /var/log/cubefs/, /etc/cubefs) with the appropriate permissions.

sudo adduser --system --home /var/lib/cubefs --group --comment "CubeFS" cubefs ; sudo mkdir -p /var/log/cubefs /etc/cubefs ; sudo chown cubefs:cubefs /var/log/cubefs /etc/cubefs

Creating the user, group and directories.

Then, create the following systemd unit file, as /etc/systemd/system/cubefs-datanode.service:

[Unit]
Description=CubeFS Data Node
After=network.target

[Service]
Type=forking
User=cubefs
Group=cubefs
ExecStart=/usr/local/bin/cfs-server -c /etc/cubefs/datanode.json
LimitNOFILE=102400000

[Install]
WantedBy=multi-user.target

cubefs-datanode.service

Now it's time to attach disks. You should have each of your disks mounted as a mountpoint of your choice, but with a single disk per mountpoint. You do not have to use a partition table (but can if you prefer) - you can simply do for example mkfs.xfs /dev/sdx (IF /dev/sdx is the disk you are sure you want to format). Make sure your disks are mounted by UUID - you can find the UUID of each disk by doing, for example, blkid /dev/sda (or if using partitions, blkid /dev/sda1).

Once you have all of your disks attached to your nodes, with UUID based mounting to avoid issues, create a cubefs folder within each mountpoint and change it to be owned by the cubefs user. We'll assume all your disks are named /mnt/brick.something from hereon out, but feel free to change the commands as needed.

Here's an example of creating the relevant cubefs folders in a system with a bunch of bricks mounted as /mnt/brick.$name:

for i in /mnt/brick.* ; do mkdir -p $i/cubefs ; chown cubefs:cubefs $i/cubefs ; done

Making CubeFS brick folders.

It's a similar process to MooseFS bricks if you've set those up before.

Once you've got your disks attached, it's time to create the datanode.json configuration file on each of your datanodes. Unlike the previous templates, the only thing you need to change here is to replace the disks array with a list of your disks and set the datanode key. The value after the colon (:) is the reserved space for that disk in bytes (space that CubeFS will leave free and assume cannot be used, in other words), useful for ensuring disks don't fill up completely. You'll want to set "serviceIDKey" to the value of auth_id_key from /etc/cubefs/key_datanode.json from your first Authnode.

As a hint, here's a script to give you the disks array preformatted as a ready-to-paste block if you have all your disks as /mnt/brick.$name:

root@monstar:~# for i in /mnt/brick.*/cubefs ; do echo "     \"$i:10737418240\"," ; done

Quick little bash snippet. Make sure to remove the final comma (,) from the output before use.

{
  "role": "datanode",
  "listen": "17310",
  "prof": "17320",
  "logDir": "/var/log/cubefs/datanode/log",
  "logLevel": "info",
  "raftHeartbeat": "17330",
  "raftReplica": "17340",
  "raftDir":"/var/lib/cubefs/datanode/raft",
  "exporterPort": 9502,
  "masterAddr": [
     "10.0.20.101:17010",
     "10.0.20.102:17010",
     "10.0.20.103:17010"
  ],
  "disks": [
     "/mnt/brick.a/cubefs:10737418240",
     "/mnt/brick.b/cubefs:10737418240",
     "/mnt/brick.c/cubefs:10737418240",
     "/mnt/brick.d/cubefs:10737418240",
     "/mnt/brick.e/cubefs:10737418240",
     "/mnt/brick.f/cubefs:10737418240",
     "/mnt/brick.g/cubefs:10737418240",
     "/mnt/brick.h/cubefs:10737418240",
     "/mnt/brick.i/cubefs:10737418240"
  ],
  "serviceIDKey": "replace-me-with-the-value-from-auth_id_key"
}

datanode.json

Now that you have each of your datanodes configured, bring them online.

root@monstar:~# systemctl enable --now cubefs-datanode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-datanode.service → /etc/systemd/system/cubefs-datanode.service.
root@ambellina:~# systemctl enable --now cubefs-datanode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-datanode.service → /etc/systemd/system/cubefs-datanode.service.
root@al:~# systemctl enable --now cubefs-datanode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-datanode.service → /etc/systemd/system/cubefs-datanode.service.
root@sizer:~# systemctl enable --now cubefs-datanode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-datanode.service → /etc/systemd/system/cubefs-datanode.service.

Success!

wings@blackberry:~$ cfs-cli datanode ls
[Data nodes]
ID        ADDRESS                         WRITABLE    STATUS  
2         10.0.20.15:17310                Yes         Active  
3         10.0.20.14:17310                Yes         Active  
4         10.0.20.18:17310                Yes         Active  
5         10.0.20.16:17310                Yes         Active  

Four datanodes

wings@blackberry:~$ cfs-cli datanode info 10.0.20.15:17310
[Data node info]
  ID                  : 2
  Address             : 10.0.20.15:17310
  Carry               : 0.6569280350771028
  Allocated ratio     : 0.9921794251966212
  Allocated           : 229.66 TB
  Available           : 1.15 TB
  Total               : 231.47 TB
  Zone                : default
  IsActive            : Active
  Report time         : 2024-02-14 08:02:01
  Partition count     : 0
  Bad disks           : []
  Persist partitions  : []

Examining a datanode

wings@blackberry:~$ cfs-cli cluster info
[Cluster]
  Cluster name       : rifflabs-per
  Master leader      : 10.0.20.101:17010
  Master-101           : 10.0.20.101:17010
  Master-102           : 10.0.20.102:17010
  Master-103           : 10.0.20.103:17010
  Auto allocate      : Enabled
  MetaNode count     : 0
  MetaNode used      : 0 GB
  MetaNode total     : 0 GB
  DataNode count     : 4
  DataNode used      : 1517098 GB
  DataNode total     : 1613692 GB
  Volume count       : 0
  Allow Mp Decomm    : Enabled
  EbsAddr            : 
  LoadFactor         : 0
  BatchCount         : 0
  MarkDeleteRate     : 0
  DeleteWorkerSleepMs: 0
  AutoRepairRate     : 0
  MaxDpCntLimit      : 3000

Viewing the overall state of the cluster

(PS: My cluster appears to have data used already as I have other data on the disks used by CubeFS, so it appears as used space).

Creating the metadata nodes

If your metadata nodes will reside on the same hosts as your datanodes (which I highly recommend*), setting up the metadata nodes is simple and easy. If not, follow the rough directions above for installing CubeFS and creating users and directories and such.

* Note that in a very large system you may want to have dedicated hosts for metadata.

Open /etc/systemd/system/cubefs-metanode.service in an editor, and fill it in as follows:

[Unit]
Description=CubeFS Meta Node
After=network.target

[Service]
Type=forking
User=cubefs
Group=cubefs
ExecStart=/usr/local/bin/cfs-server -c /etc/cubefs/metanode.json
LimitNOFILE=102400000

[Install]
WantedBy=multi-user.target

cubefs-metanode.service

Now, edit /etc/cubefs/metanode.json and fill it out as follows. You'll want to set "serviceIDKey" to the value of auth_id_key from /etc/cubefs/key_metanode.json from your first Authnode.

{
    "role": "metanode",
    "listen": "17210",
    "prof": "17220",
    "logLevel": "info",
    "metadataDir": "/var/lib/cubefs/metanode/data/meta",
    "logDir": "/var/log/cubefs/metanode/log",
    "raftDir": "/var/lib/cubefs/metanode/data/raft",
    "raftHeartbeatPort": "17230",
    "raftReplicaPort": "17240",
    "totalMem":  "8589934592",
    "exporterPort": 9501,
    "masterAddr": [
        "10.0.20.101:17010",
        "10.0.20.102:17010",
        "10.0.20.103:17010"
    ],
    "serviceIDKey": "replace-me-with-the-value-from-auth_id_key"
}

metanode.json

As an aside, you should read the following important notes from the CubeFS docs:

Notes

The configuration options listen, raftHeartbeatPort, and raftReplicaPort cannot be modified after the program is first configured and started.

The relevant configuration information is recorded in the constcfg file under the metadataDir directory. If you need to force modification, you need to manually delete the file.

The above three configuration options are related to the registration information of the MetaNode in the Master. If modified, the Master will not be able to locate the MetaNode information before the modification.

In other words, modifying the listen port or any of the Raft ports used by the node after deployment will cause issues. Basically, don't, and if you have to, be aware of the above.

If you have a large/alternative disk such as a dedicated NVMe drive you wish to use for metadata, mount it as /var/lib/cubefs/metanode/data/meta prior to starting the metanode for the first time (if you do need to set that up later - stop the metanode, move /var/lib/cubefs/metanode/data/meta to a new location, mkdir /var/lib/cubefs/metanode/data/meta, mount your new drive, and then move the contents of the old folder back to the original location now that your new drive is mounted).

Finally, start all your metanodes.

systemctl enable --now cubefs-metanode

The final step before a working cluster.

If you see something like the following, you've done it - congratulations.

wings@blackberry:~$ cfs-cli metanode ls
[Meta nodes]
ID        ADDRESS              WRITABLE    STATUS  
6         10.0.20.15:17210     Yes         Active  
7         10.0.20.18:17210     Yes         Active  
8         10.0.20.16:17210     Yes         Active  
9         10.0.20.14:17210     Yes         Active

You should have a fully functioning CubeFS installation.

Creating your first volume

It's time to try it out for real.

Create your first volume, specifying that you want:

  • A volume called media
  • Owned by media
  • With follower-read=true (highly recommended for better performance)
  • 2 replicas (in other words, 2x replication, two copies)
  • A total volume capacity of 5000GB (5TB)

Feel free to tweak any of these as you wish, and run cfs-cli volume create --help for more information.

wings@blackberry:~$ cfs-cli volume create media media --follower-read=true --replica-num 2 --capacity 5000
Create a new volume:
  Name                     : media
  Owner                    : media
  capacity                 : 5000 G
  deleteLockTime           : 0 h
  crossZone                : false
  DefaultPriority          : false
  description              : 
  mpCount                  : 3
  replicaNum               : 2
  size                     : 120 G
  volType                  : 0
  followerRead             : true
  readOnlyWhenFull         : false
  zoneName                 : 
  cacheRuleKey             : 
  ebsBlkSize               : 8388608 byte
  cacheCapacity            : 0 G
  cacheAction              : 0
  cacheThreshold           : 10485760 byte
  cacheTTL                 : 30 day
  cacheHighWater           : 80
  cacheLowWater            : 60
  cacheLRUInterval         : 5 min
  TransactionMask          : 
  TransactionTimeout       : 1 min
  TxConflictRetryNum       : 0
  TxConflictRetryInterval  : 0 ms

Confirm (yes/no)[yes]: yes
Create volume success.

Creating our first volume.

That's it! No, really, that's it. You're ready to mount it!

Mounting the filesystem

On the machine you wish to use CubeFS with, install CubeFS (you will not need a CubeFS user this time) and then install the following systemd unit file. This one is special - it is an aliased service unit, which will allow you to create arbitrary mounts using CubeFS as you wish.

You will create it as /etc/systemd/system/cubefs-mount@.service - note the presence of the @ in aliased units.

[Unit]
Description=CubeFS Mount
After=local-fs.target
Wants=local-fs.target
After=network.target                       
Wants=network-online.target
After=network-online.target
After=nss-lookup.target
After=time-sync.target

[Service]
Type=forking
ExecStart=/usr/local/bin/cfs-client -c /etc/cubefs/client-%i.conf
IOAccounting=true
IOWeight=250
StartupIOWeight=100

Restart=on-failure
RestartSec=4
StartLimitInterval=20
StartLimitBurst=4

[Install]
WantedBy=multi-user.target

cubefs-mount@.service

Now that you've got your mount unit file created, you just need a configuration file.

Let's open up /etc/cubefs/client-media.conf and create our first mount config:

{
  "mountPoint": "/mnt/cubefs/media",
  "volName": "media",
  "owner": "media",
  "masterAddr": "10.0.20.101:17010,10.0.20.102:17010,10.0.20.103:17010",
  "logDir": "/var/log/cubefs/client/log",
  "logLevel": "info",
  "clientIDKey": "replace-me-with-the-value-from-auth_id_key"
}

client-media.conf

You'll need the value of auth_key_id from /etc/cubefs/key-client.json on your first Authnode.

💡
As an exercise for the reader, if you want to you can create as many keys as you want for clients and thus more easily revoke keys if a client's key is leaked or compromised (by using deletekey instead of createkey). Basically just create a new version of data_client.json with a new name and a new ID inside it, and then generate a key as before. The contents of auth_id_key contains the ID you want to authenticate with, so you can authenticate with different keys. You can actually do this same trick for most of the components in the cluster, but beware of overcomplicating things.

Without further ado...

riff@robot01:~$ sudo mkdir -p /mnt/cubefs/media ; sudo systemctl enable --now cubefs-mount@media
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-mount@media.service → /etc/systemd/system/cubefs-mount@.service.
riff@robot01:/etc/cubefs$ df -h /mnt/cubefs/media
Filesystem      Size  Used Avail Use% Mounted on
cubefs-media    4.9T     0  4.9T   0% /mnt/cubefs/media

CubeFS, finally mounted.

We have a working mount.

Done!

This intro post covers setting up a complete but basic CubeFS installation. Future posts will cover enabling the erasure coding and object storage subsystems, as well as integrating CubeFS into Kubernetes using the CSI driver.

If you enjoyed this post, check out the rest of my blog and maybe drop me a message via any of my contact details. Cheers!