CubeFS: A crash course
A new distributed filesystem caught my interest recently - CubeFS. If you know anything about me, you'll know I'm a massive distributed storage nerd; with around 2+ petabytes of storage in my lab (RiffLabs Perth) bound together at various times by distributed filesystems such as MooseFS (Pro), SeaweedFS and Ceph.
I've been using and enjoying MooseFS Pro for quite a while now, but I have a capacity-limited licence and over 10x more capacity than I have a licence for, which restricts how much I can actually use my storage to an extent.
What I want out of a distributed filesystem boils down to a few major factors:
- Storage tiering
- Reliable high availability
- Data reliability and resilience
- Hands off self healing
- Performance
- (Bonus) Erasure coding
- (Bonus) Georeplication
Wait, what's a distributed filesystem?
Good question! A distributed filesystem allows you to combine the storage of one or many machines (usually many, given the word "distributed!") into one filesystem that can be mounted from one or many machines. Usually they expose a POSIX filesystem (traditional storage mount, a la NFS, in other words), but they can also come in the form of a block storage device (such as Ceph RBD) or object storage (S3), or HDFS, or... yeah, there's a lot of variety.
Think of it like taking a bunch of NASes and combining them into one big NAS and having them work together, visible as one unified storage drive.
Enter CubeFS
CubeFS is a fairly new distributed filesystem that is incubating into the CNCF project at the moment, and supports exabyte-scale installations (OPPO has one!).
Incredibly, CubeFS ticks all of these boxes with the exception of the first one - in the fully free and open source version.* (CubeFS is licenced under the Apache-2.0 software licence).
(Currently erasure coding relies on having a Kafka installation - thus, it won't be covered in today's guide, as I don't have one yet - but it works, it works at scale and it even allows for fine customisation of erasure coding schemes. Very cool!)
As a bonus, it supports:
- Native S3 gateway for object storage
- Kubernetes persistent storage
- Volume creation and management
- Scalable metadata storage that allows for horizontal scaling of metadata without creating very large "master" nodes
What is it missing?
At the moment the main thing I want that CubeFS doesn't have is storage tiering - I want to be able to have SSDs, HDDs and NVMe storage in one cluster, and have data intelligently (or manually) placed onto the various tiers.
I don't necessarily need it to happen automatically - something simpler like MooseFS' storage classes would be fine.
Luckily, the CubeFS team has confirmed that they are working on storage tiering, and it might even be a full implementation with proper multi level caching. This is huge. 😄
The catch
Great, so CubeFS looks really promising, right?
There's just a small catch - there's basically zero security out of the box. It assumes you're running on a trusted network. It's not that they haven't thought of this issue - they have! It's just that the authentication stuff hasn't made it all the way into a release yet, despite being stable and ready for usage. This means if you deploy a CubeFS filesystem normally, following all the directions on the CubeFS homepage, anyone with network access to the master/resource manager nodes can do anything they want to your filesystem, including deleting volumes, reading user keys and various other mischief.
Fixing this is luckily fairly easy. Unfortunately, the documentation for authentication is lacking, with only hints from the old ChubaoFS (the former name for CubeFS) documentation to help you. It took me weeks of pain to figure this one out - luckily, once you know how to do it, it's easy; thus my motivation for writing this guide.
Eventually I plan to turn this entire guide into a fully automated Ansible playbook that will turn the entire process into a 5-10 minute affair. Stay tuned for that.
Integrating authentication and encryption
Simply put, git clone
CubeFS, switch to the release-3.3.1
(latest release) branch, and then run git cherry-pick 7f43eaa513a579fad99aaade86bc419e75d89981
and fix the conflicts.
Then build CubeFS as normal, and deploy it to your nodes.
...Too complicated? 😄 Alright, well, fair enough. I did it for you!
Here's my fixed version of CubeFS.
This is a very light set of changes, simply integrating the relevant commit. I have made zero other changes, but have also included the steps I took (above) so you can verify this for yourself, which you should do if you plan to use this software anywhere important. I plan to keep my branch up-to-date. I will not be publishing binaries, as I don't believe the supply chain risks are worth it; if I published infected binaries unwittingly, it would be Pretty Bad.
Don't worry, we'll go over everything you need to use the modified CubeFS software and any caveats you'll want to know about. This is a fairly long guide, but it is worth it - you will end up with a fully resilient, self healing, distributed and clustered filesystem with high availability and all kinds of other fun goodies.
Getting started
You'll want to get started with either a single machine per role, or 3 machines per role. For high availability, I recommend using 3x master nodes and 3x authnodes, and then as many metadata and datanode nodes as you want to use.
In my cluster, I will be deploying to the following machines:
# master nodes
cubefs-master01.per.riff.cc
cubefs-master02.per.riff.cc
cubefs-master03.per.riff.cc
# auth nodes
cubefs-authnode01.per.riff.cc
cubefs-authnode02.per.riff.cc
cubefs-authnode03.per.riff.cc
# datanode nodes
monstar.riff.cc
ambellina.riff.cc
al.riff.cc
sizer.riff.cc
# metadata nodes
monstar.riff.cc
ambellina.riff.cc
al.riff.cc
inferno.riff.cc
sizer.riff.cc
A list of nodes to deploy to.
You can, optionally, spin up a node specifically for building the CubeFS software, to avoid having to install an entire build toolchain on one (or more) of your nodes. I'll be skipping this, opting instead to use cubefs-authnode01.per.riff.cc
as my buildbox, but in a proper production environment (especially an airgapped one!) this may be more appropriate.
Create your machines
In my case, I'll be using Tinkerbell and tink-flow to deploy my CubeFS nodes, except for the data and metadata nodes which already exist (and are running Proxmox).
I'll be using Debian 12, my standard distribution of choice for a lightweight and stable base, but really any standard-ish Linux distribution will work, from Ubuntu to Rocky Linux and maybe even Alpine Linux.
We won't cover provisioning in this guide, but essentially the process looked like:
- Create VMs for each of the nodes
- Collect MAC addresses from each
- Create a tink-flow manifest with the machines in it
- Let Tinkerbell deploy the machines with Debian 12
Installing Debian by hand is also a valid option.
You'll want to enable passwordless sudo on your CubeFS machines, at least during setup.
You should also create DNS entries for each of your machines. I'm using the hostnames, then .per.riff.cc
(to indicate the Perth RiffLab).
Building and installing CubeFS
We'll be building CubeFS on our first authnode, cubefs-authnode01
.
Log onto the node, then install the build dependencies.
First, required packages:
riff@cubefs-authnode01:~$ sudo apt install -y git cmake maven build-essential
Installing the build toolchains and compilers, such as GCC.
Then install Go 1.22 (or whatever the latest stable version of Go is at the time of reading).
root@cubefs-authnode01:~# wget https://go.dev/dl/go1.22.0.linux-amd64.tar.gz
--2024-02-14 04:39:54-- https://go.dev/dl/go1.22.0.linux-amd64.tar.gz
Resolving go.dev (go.dev)... 216.239.32.21, 216.239.38.21, 216.239.36.21, ...
[...]
Saving to: ‘go1.22.0.linux-amd64.tar.gz’
[...]
2024-02-14 04:39:56 (43.6 MB/s) - ‘go1.22.0.linux-amd64.tar.gz’ saved [68988925/68988925]
root@cubefs-authnode01:~# rm -rf /usr/local/go && tar -C /usr/local -xzf go1.22.0.linux-amd64.tar.gz
root@cubefs-authnode01:~# echo "export PATH=$PATH:/usr/local/go/bin" > /etc/profile.d/golang.sh
Installing Go 1.22
Log out and in to ensure the Go toolchain is on your path (or source /etc/profile.d/golang.sh
).
Check out my repository, check out the "Zorlin/authnode" branch, then run make
to build CubeFS.
git clone https://github.com/Zorlin/cubefs/ && cd cubefs
git checkout Zorlin/authnode
make
Instructions for compiling the custom CubeFS code.
build cfs-server success
build cfs-authtool success
build cfs-client success
build cfs-cli success
build libsdk: libcfs.so success
build java libcubefs [INFO] Scanning for projects...
[... lots of maven output ...]
build java libcubefs success
build cfs-fsck success
build fdstore success
build cfs-preload success
build cfs-blockcache success
build blobstore success
Building CubeFS
Now, throw together a quick tarball with your binaries to make it easier to transfer it to your nodes.
riff@cubefs-authnode01:~/cubefs$ cd ~
riff@cubefs-authnode01:~$ tar czvf ./cubefs-3.1.1-withauth.tar.gz cubefs/build
[... lots of tar output later ...]
Creating a tarball
While you're there, grab the SHA256 hash of the tarball so you can verify its integrity later if needed.
riff@cubefs-authnode01:~$ sha256sum cubefs-3.1.1-withauth.tar.gz
a0e6c3cd86be4daa9522756f75c910b468dcc6fc5b15d7df24430f6915fe1db9 cubefs-3.1.1-withauth.tar.gz
Getting the hash of our tarball. Your hash will be different as CubeFS builds are not deterministic.
Exit back to your local machine, then log back into your authnode using the -A
flag when running ssh
(for example ssh riff@cubefs-authnode01.per.riff.cc -A
). This will allow you to use your local machine's SSH keys on the remote (authnode01) and thus let you copy your tarball to all of your machines.
riff@cubefs-authnode01:~$ for i in cubefs-authnode{01..03} cubefs-master{01..03} ; do scp cubefs-3.1.1-withauth.tar.gz riff@$i.per.riff.cc:/home/riff/ ; done
cubefs-3.1.1-withauth.tar.gz 100% 280MB 418.3MB/s 00:00
cubefs-3.1.1-withauth.tar.gz 100% 280MB 366.2MB/s 00:00
cubefs-3.1.1-withauth.tar.gz 100% 280MB 393.9MB/s 00:00
cubefs-3.1.1-withauth.tar.gz 100% 280MB 388.6MB/s 00:00
cubefs-3.1.1-withauth.tar.gz 100% 280MB 425.1MB/s 00:00
cubefs-3.1.1-withauth.tar.gz
Using a for
loop and scp
to copy the tarball to all relevant machines.
Now we'll do a similar trick to allow us to install the CubeFS binaries across all the hosts (Note: If you prefer to use Ansible or something more elegant for these steps, please do! This is just a basic example).
for i in cubefs-authnode{01..03} cubefs-master{01..03} ; do ssh riff@$i.per.riff.cc "tar xvf cubefs-3.1.1-withauth.tar.gz ; sudo cp cubefs/build/bin/cfs-* /usr/local/bin/ && rm -r cubefs"; done
Installing CubeFS, the hacky way.
Finally, we'll use the same sort of thing to create a CubeFS user and group, and create the various directories we'll want CubeFS to use during its operation.
for i in cubefs-authnode{01..03} cubefs-master{01..03} ; do ssh riff@$i.per.riff.cc "sudo adduser --system --home /var/lib/cubefs --group --comment "CubeFS" cubefs ; sudo mkdir -p /var/log/cubefs /etc/cubefs ; sudo chown cubefs:cubefs /var/log/cubefs /etc/cubefs" ; done
Creating the CubeFS user and group and all required directories.
Finally, we're ready to start properly setting up the CubeFS cluster.
Setting up the cluster keys
Setting up our cluster is now relatively easy, but will take a few steps. We will set up the authnode cluster first, then the master cluster, then the datanodes and metadata nodes, (which may be colocated on the same boxes).
Creating the initial keys (PKI) for CubeFS
Before we can get started, we need to generate the root keys for our cluster. The following instructions are based on the ChubaoFS documentation and LOTS of trial and error, as well as the current CubeFS documentation.
Become root with sudo su -
and navigate to /etc/cubefs
.
riff@cubefs-authnode01:~$ sudo -i
root@cubefs-authnode01:~# cd /etc/cubefs
root@cubefs-authnode01:/etc/cubefs#
Okay, simple enough. Navigate to the CubeFS directory.
Run cfs-authtool authkey
and you should see two files are generated, authroot.json
and authservice.json
.
root@cubefs-authnode01:/etc/cubefs# cfs-authtool authkey
root@cubefs-authnode01:/etc/cubefs# ls
authroot.json authservice.json
Generating the root authkeys.
From here, you'll want to create a new file called authnode.json
and fill it out. Replace "ip" with your node's IP address, "id" with a unique ID (we suggest the last octet of your node's IP address, as it is entirely arbitrary but must be unique), and "peers" with a list of your authnode's IP addresses. The format for "peers" is "$id:$ipv4:8443" for each host, with commas separating each host.
Change "clusterName" to whatever you like, then set "authServiceKey" and "authRootKey" to the values of the "auth_key" property found in authroot.json
and authservice.json
These keys are VERY sensitive and are fairly painful to change, so treat them appropriately.
Here is a slightly redacted version of my authnode.json
file.
{
"role": "authnode",
"ip": "10.0.20.91",
"port": "8443",
"prof": "10088",
"id": "91",
"peers": "91:10.0.20.91:8443,92:10.0.20.92:8443,93:10.0.20.93:8443",
"logDir": "/var/log/cubefs/authnode",
"logLevel": "info",
"retainLogs": "100",
"walDir": "/var/lib/cubefs/authnode/raft",
"storeDir": "/var/lib/cubefs/authnode/rocksdbstore",
"exporterPort": 9510,
"clusterName": "rifflabs-per",
"authServiceKey": "REPLACE-ME-WITH-YOUR-KEY",
"authRootKey": "REPLACE-ME-WITH-YOUR-KEY",
"enableHTTPS": true
}
You will want to customise this for your installation.
Now we'll generate a self-signed TLS certificate, valid for each of the authnodes' IP addresses. Adjust the subject and subjectAltName according to your environment.
openssl req \
-x509 \
-nodes \
-newkey ed25519 \
-keyout server.key \
-out server.crt \
-days 3650 \
-subj "/C=AU/ST=Western Australia/L=Perth/O=riff.cc/OU=Infra/OU=Infra/CN=*" \
-addext "subjectAltName=DNS:auth.cubefs.per.riff.cc,IP:127.0.0.1,IP:10.0.20.91,IP:10.0.20.92,IP:10.0.20.93"
Generating a self-signed TLS certificate. A future version of this tutorial will involve using real TLS certificates.
Check that the certificate material was generated correctly, then create the folder /app
and symlink the material into /app
, and finally change /app
to be owned by the CubeFS user.
# Check for server.crt and server.key
root@cubefs-authnode01:/etc/cubefs# ls server.*
server.crt server.key
# Create symlinks in /app to the certificates.
root@cubefs-authnode01:/etc/cubefs# mkdir /app ; ln -s /etc/cubefs/server.crt /app/server.crt ; ln -s /etc/cubefs/server.key /app/server.key ; chown -R cubefs:cubefs /app
We'll now install the systemd
unit for cubefs-authnode
, allowing us to start and run it as a service.
With your favourite editor, open up a new file at /etc/systemd/system/cubefs-authnode.service
and fill it with the following contents:
[Unit]
Description=CubeFS AuthNode Server
After=network.target
[Service]
Type=forking
User=cubefs
Group=cubefs
ExecStart=/usr/local/bin/cfs-server -c /etc/cubefs/authnode.json
LimitNOFILE=102400000
[Install]
WantedBy=multi-user.target
cubefs-authnode.service
Then start the service.
root@cubefs-authnode01:/etc/cubefs# sudo systemctl daemon-reload ; sudo systemctl enable cubefs-authnode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-authnode.service → /etc/systemd/system/cubefs-authnode.service.
Starting cubefs-authnode for the first time.
Congratulations, you've setup your authnode. If you only have one, you can skip to the next heading, otherwise, read on.
If you're using a cluster, you'll need to setup your remaining authnodes. To do so, simply copy /etc/cubefs/authnode.json
, /etc/systemd/system/cubefs-authnode.service
, /etc/cubefs/server.crt
, and /etc/cubefs/server.key
to your remaining nodes, and also recreate the symlinks in /app
:
sudo mkdir /app ; sudo ln -s /etc/cubefs/server.crt /app/server.crt ; ln -s /etc/cubefs/server.key /app/server.key ; sudo chown -R cubefs:cubefs /app
The authnode looks for the certificates in /app/ whereas every other component looks in /etc/cubefs/, thus this song and dance.
You'll need to adjust the "ip" and "id" fields in authnode.json
to match the node that configuration lives on, but other than that all configuration should be identical. You should also make sure the permissions on your TLS key and authnode configuration are correct - cd /etc/cubefs && sudo chmod 600 server.key authnode.json
will take care of this. While you're at it, make sure /etc/cubefs/
is owned by cubefs
on all of your nodes and that any files with keys in them are also set to chmod 600
(ie: read/write allowed for "cubefs", no permissions for group, no permissions for world).
In case it's helpful, this is what proper permissions should look like:
riff@cubefs-authnode01:/etc/cubefs$ ls -lah
total 28K
drwxr-xr-x 2 cubefs cubefs 4.0K Feb 14 06:03 .
drwxr-xr-x 76 root root 4.0K Feb 14 05:30 ..
-rw------- 1 cubefs cubefs 567 Feb 14 05:56 authnode.json
-rw------- 1 cubefs cubefs 221 Feb 14 05:37 authroot.json
-rw------- 1 cubefs cubefs 221 Feb 14 05:37 authservice.json
-rw-r--r-- 1 cubefs cubefs 843 Feb 14 05:48 server.crt
-rw------- 1 cubefs cubefs 119 Feb 14 05:48 server.key
Proper permissions.
Once your remaining nodes are configured, start them just like you started the first node:
root@cubefs-authnode02:/etc/cubefs# sudo systemctl daemon-reload ; sudo systemctl enable cubefs-authnode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-authnode.service → /etc/systemd/system/cubefs-authnode.service.
root@cubefs-authnode03:/etc/cubefs# sudo systemctl daemon-reload ; sudo systemctl enable cubefs-authnode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-authnode.service → /etc/systemd/system/cubefs-authnode.service.
Starting the remaining nodes.
Checking on the state of your authnode cluster
You should be able to examine the logs on one of your authnodes (the first one is probably best for this) and see it electing a leader.
riff@cubefs-authnode01:/etc/cubefs$ sudo cat /var/log/cubefs/authnode/authNode/authNode_warn.log
2024/02/14 06:11:44.783799 [WARN ] authnode_manager.go:36: action[handleLeaderChange] change leader to [10.0.20.91:8443]
The leader will be elected as soon as two of the three (or whatever constitutes quorum in your Authnode cluster) nodes are online.
Creating the admin keys and an admin ticket
Create the following files in /etc/cubefs
on your first node. These will be used to generate keys with the appropriate permissions:
{
"id": "admin",
"role": "service",
"caps": "{\"API\":[\"*:*:*\"]}"
}
data_admin.json
{
"id": "cubeclient",
"role": "client",
"caps": "{\"API\":[\"*:*:*\"], \"Vol\":[\"*:*:*\"]}"
}
data_client.json - you can customise this ID if you desire.
{
"id": "DatanodeService",
"role": "service",
"caps": "{\"API\":[\"*:*:*\"]}"
}
data_datanode.json
{
"id": "MasterService",
"role": "service",
"caps": "{\"API\":[\"*:*:*\"]}"
}
data_master.json
{
"id": "MetanodeService",
"role": "service",
"caps": "{\"API\":[\"*:*:*\"]}"
}
data_metanode.json
Whew, all done with those.
Use the authservice.json
file you generated earlier to create an Authnode
ticket.
root@cubefs-authnode01:/etc/cubefs# cfs-authtool ticket -https -host=10.0.20.91:8443 -keyfile=authservice.json -output=ticket_auth.json getticket AuthService
Replace 10.0.20.91:8443 with your host's IP and port.
Now, create an admin key:
root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_auth.json -data=data_admin.json -output=key_admin.json AuthService createkey
Creating an admin key.
Finally, create an admin ticket:
root@cubefs-authnode01:/etc/cubefs# cfs-authtool ticket -https -host=10.0.20.91:8443 -keyfile=key_admin.json -output=ticket_admin.json getticket AuthService
You can now use the admin ticket to create the rest of the necessary keys.
Creating the master/resource manager keys
root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_admin.json -data=data_master.json -output=key_master.json AuthService createkey
As usual, replace the host as needed.
Creating the datanode keys
root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_admin.json -data=data_datanode.json -output=key_datanode.json AuthService createkey
Creating the metadata node keys
root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_admin.json -data=data_metanode.json -output=key_metanode.json AuthService createkey
Creating the client keys
root@cubefs-authnode01:/etc/cubefs# cfs-authtool api -https -host=10.0.20.91:8443 -ticketfile=ticket_admin.json -data=data_client.json -output=key_client.json AuthService createkey
Creating the master/resource manager nodes
On your master nodes, install the following systemd service file as /etc/systemd/system/cubefs-master.service
:
[Unit]
Description=CubeFS Master Server
After=network.target
[Service]
Type=forking
User=cubefs
Group=cubefs
ExecStart=/usr/local/bin/cfs-server -c /etc/cubefs/master.json
LimitNOFILE=102400000
[Install]
WantedBy=multi-user.target
cubefs-master.service
Copy server.crt
(but NOT server.key
) from your authnode to /etc/cubefs/.
Now, fill out the following and save it as /etc/cubefs/master.json
on each of your master nodes.
{
"role": "master",
"ip": "10.0.20.101",
"listen": "17010",
"prof":"17020",
"id":"101",
"peers": "101:10.0.20.101:17010,102:10.0.20.102:17010,103:10.0.20.103:17010",
"retainLogs":"20000",
"logDir": "/var/log/cubefs/master/",
"logLevel":"info",
"walDir":"/var/lib/cubefs/master/data/wal",
"storeDir":"/var/lib/cubefs/master/data/store",
"clusterName":"rifflabs-per",
"metaNodeReservedMem": "1073741824",
"masterServiceKey":"a-not-so-long-string-from-auth-key",
"authenticate": true,
"authNodeHost": "10.0.20.91:8443,10.0.20.92:8443,10.0.20.93:8443",
"authNodeEnableHTTPS": true,
"authNodeCertFile": "/etc/cubefs/server.crt"
}
master.json
You'll want to, as before, replace some of the values.
Specifically, set ip
to the IP of each master node, set id
to an appropriate unique value for each node (once again, it's probably best to use the same value as the last octet of the node's IP address for simplicity). Set clusterName to the same clusterName that you set before (when setting up the authnodes). Set "peers" in a similar fashion to the authnodes, only this time we're referencing the masters instead.
For masterServiceKey, copy the value from /etc/cubefs/key_master.json
on your first Authnode. Here, you must use the value auth_key and not the value from auth_key_id. Confusingly, some components need the former and some need the latter.
Once you've configured all your master nodes, go ahead and start them all.
You should be able to see in the logs that a leader is elected when you have enough nodes for quorum, as before:
==> /var/log/cubefs/master/master/master_critical.log <==
2024/02/14 06:56:39.391449 [FATAL] alarm.go:48: rifflabs_per_master_warning clusterID[rifflabs-per] leader is changed to 10.0.20.101:17010
A leader has been elected.
Configuring cfs-cli
Now for the exciting part - we finally get to test out the basics of our cluster authentication. (Okay, the really exciting part comes later, but this is still cool).
Open /etc/cubefs/key_client.json
and take the value auth_id_key and copy it.
Then, fill out the following on your management machine (whatever machine you want to manage CubeFS with - make sure you install the tarball from earlier onto it), replacing the IP addresses with your master cluster's IP addresses. You'll need to set this value based on auth_id_key (NOT auth_id!) from /etc/cubefs/key_client.json
from your first Authnode.
{
"masterAddr": [
"10.0.20.101:17010",
"10.0.20.102:17010",
"10.0.20.103:17010"
],
"timeout": 60,
"clientIDKey":"a-very-long-string-from-auth-id-key"
}
Make sure you use auth_id_key here, not auth_id, or you'll tear your hair out trying to figure it out.
Let's first try an unprivileged operation:
wings@blackberry:~$ cfs-cli cluster info
[Cluster]
Cluster name : rifflabs-per
Master leader : 10.0.20.101:17010
Master-101 : 10.0.20.101:17010
Master-102 : 10.0.20.102:17010
Master-103 : 10.0.20.103:17010
Auto allocate : Enabled
MetaNode count : 0
MetaNode used : 0 GB
MetaNode total : 0 GB
DataNode count : 0
DataNode used : 0 GB
DataNode total : 0 GB
Volume count : 0
Allow Mp Decomm : Enabled
EbsAddr :
LoadFactor : 0
BatchCount : 0
MarkDeleteRate : 0
DeleteWorkerSleepMs: 0
AutoRepairRate : 0
MaxDpCntLimit : 3000
This tells us that our master cluster is working, and that things seem somewhat healthy.
Now, let's try something that requires a key:
wings@blackberry:~$ cfs-cli user create test
Create a new CubeFS cluster user
User ID : test
Password : [default]
Access Key: [auto generate]
Secret Key: [auto generate]
Type : normal
Confirm (yes/no)[yes]: yes
Create user success:
[Summary]
User ID : test
Access Key : REDACTED
Secret Key : REDACTED
Type : normal
Create Time: 2024-02-14 07:01:43
[Volumes]
VOLUME PERMISSION
If you instead see an error like this:
Error: Create user failed: invalid clientIDKey
Womp womp.
check all your key configuration (and in the absolute worst case, blow away your authnode's /var/lib/cubefs/*
data and set it up again - I hit a bug at some point where nothing I did would make certain keys be recognised, and creating a fresh Keystore was the only way to fix it. YMMV, and obviously only do that as a drastic measure) and then try again.
🎉 Great job if you've gotten this far! The rest is significantly easier than the previous steps, so you're not got far to go. 😄
Creating the full cluster
Now that we have our base infrastructure, we can add the actual data and metadata nodes that will store everythitar xvf cubefs-3.1.1-withauth.tar.gz ; cp cubefs/build/bin/cfs-* /usr/local/bin/ ; rm -r cubefs/
ng for us. Then we'll mount our new filesystem and store some data!
Creating the data nodes
On each data node:
Install CubeFS using the tarballs, as before, as well as creating the CubeFS user, group and directories (/var/lib/cubefs
, /var/log/cubefs/
, /etc/cubefs
) with the appropriate permissions.
sudo adduser --system --home /var/lib/cubefs --group --comment "CubeFS" cubefs ; sudo mkdir -p /var/log/cubefs /etc/cubefs ; sudo chown cubefs:cubefs /var/log/cubefs /etc/cubefs
Creating the user, group and directories.
Then, create the following systemd unit file, as /etc/systemd/system/cubefs-datanode.service
:
[Unit]
Description=CubeFS Data Node
After=network.target
[Service]
Type=forking
User=cubefs
Group=cubefs
ExecStart=/usr/local/bin/cfs-server -c /etc/cubefs/datanode.json
LimitNOFILE=102400000
[Install]
WantedBy=multi-user.target
cubefs-datanode.service
Now it's time to attach disks. You should have each of your disks mounted as a mountpoint of your choice, but with a single disk per mountpoint. You do not have to use a partition table (but can if you prefer) - you can simply do for example mkfs.xfs /dev/sdx
(IF /dev/sdx is the disk you are sure you want to format). Make sure your disks are mounted by UUID - you can find the UUID of each disk by doing, for example, blkid /dev/sda
(or if using partitions, blkid /dev/sda1
).
Once you have all of your disks attached to your nodes, with UUID based mounting to avoid issues, create a cubefs
folder within each mountpoint and change it to be owned by the cubefs
user. We'll assume all your disks are named /mnt/brick.something
from hereon out, but feel free to change the commands as needed.
Here's an example of creating the relevant cubefs
folders in a system with a bunch of bricks mounted as /mnt/brick.$name
:
for i in /mnt/brick.* ; do mkdir -p $i/cubefs ; chown cubefs:cubefs $i/cubefs ; done
Making CubeFS brick folders.
It's a similar process to MooseFS bricks if you've set those up before.
Once you've got your disks attached, it's time to create the datanode.json
configuration file on each of your datanodes. Unlike the previous templates, the only thing you need to change here is to replace the disks array with a list of your disks and set the datanode key. The value after the colon (:
) is the reserved space for that disk in bytes (space that CubeFS will leave free and assume cannot be used, in other words), useful for ensuring disks don't fill up completely. You'll want to set "serviceIDKey" to the value of auth_id_key from /etc/cubefs/key_datanode.json
from your first Authnode.
As a hint, here's a script to give you the disks
array preformatted as a ready-to-paste block if you have all your disks as /mnt/brick.$name
:
root@monstar:~# for i in /mnt/brick.*/cubefs ; do echo " \"$i:10737418240\"," ; done
Quick little bash snippet. Make sure to remove the final comma (,) from the output before use.
{
"role": "datanode",
"listen": "17310",
"prof": "17320",
"logDir": "/var/log/cubefs/datanode/log",
"logLevel": "info",
"raftHeartbeat": "17330",
"raftReplica": "17340",
"raftDir":"/var/lib/cubefs/datanode/raft",
"exporterPort": 9502,
"masterAddr": [
"10.0.20.101:17010",
"10.0.20.102:17010",
"10.0.20.103:17010"
],
"disks": [
"/mnt/brick.a/cubefs:10737418240",
"/mnt/brick.b/cubefs:10737418240",
"/mnt/brick.c/cubefs:10737418240",
"/mnt/brick.d/cubefs:10737418240",
"/mnt/brick.e/cubefs:10737418240",
"/mnt/brick.f/cubefs:10737418240",
"/mnt/brick.g/cubefs:10737418240",
"/mnt/brick.h/cubefs:10737418240",
"/mnt/brick.i/cubefs:10737418240"
],
"serviceIDKey": "replace-me-with-the-value-from-auth_id_key"
}
datanode.json
Now that you have each of your datanodes configured, bring them online.
root@monstar:~# systemctl enable --now cubefs-datanode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-datanode.service → /etc/systemd/system/cubefs-datanode.service.
root@ambellina:~# systemctl enable --now cubefs-datanode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-datanode.service → /etc/systemd/system/cubefs-datanode.service.
root@al:~# systemctl enable --now cubefs-datanode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-datanode.service → /etc/systemd/system/cubefs-datanode.service.
root@sizer:~# systemctl enable --now cubefs-datanode
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-datanode.service → /etc/systemd/system/cubefs-datanode.service.
Success!
wings@blackberry:~$ cfs-cli datanode ls
[Data nodes]
ID ADDRESS WRITABLE STATUS
2 10.0.20.15:17310 Yes Active
3 10.0.20.14:17310 Yes Active
4 10.0.20.18:17310 Yes Active
5 10.0.20.16:17310 Yes Active
Four datanodes
wings@blackberry:~$ cfs-cli datanode info 10.0.20.15:17310
[Data node info]
ID : 2
Address : 10.0.20.15:17310
Carry : 0.6569280350771028
Allocated ratio : 0.9921794251966212
Allocated : 229.66 TB
Available : 1.15 TB
Total : 231.47 TB
Zone : default
IsActive : Active
Report time : 2024-02-14 08:02:01
Partition count : 0
Bad disks : []
Persist partitions : []
Examining a datanode
wings@blackberry:~$ cfs-cli cluster info
[Cluster]
Cluster name : rifflabs-per
Master leader : 10.0.20.101:17010
Master-101 : 10.0.20.101:17010
Master-102 : 10.0.20.102:17010
Master-103 : 10.0.20.103:17010
Auto allocate : Enabled
MetaNode count : 0
MetaNode used : 0 GB
MetaNode total : 0 GB
DataNode count : 4
DataNode used : 1517098 GB
DataNode total : 1613692 GB
Volume count : 0
Allow Mp Decomm : Enabled
EbsAddr :
LoadFactor : 0
BatchCount : 0
MarkDeleteRate : 0
DeleteWorkerSleepMs: 0
AutoRepairRate : 0
MaxDpCntLimit : 3000
Viewing the overall state of the cluster
(PS: My cluster appears to have data used already as I have other data on the disks used by CubeFS, so it appears as used space).
Creating the metadata nodes
If your metadata nodes will reside on the same hosts as your datanodes (which I highly recommend*), setting up the metadata nodes is simple and easy. If not, follow the rough directions above for installing CubeFS and creating users and directories and such.
* Note that in a very large system you may want to have dedicated hosts for metadata.
Open /etc/systemd/system/cubefs-metanode.service
in an editor, and fill it in as follows:
[Unit]
Description=CubeFS Meta Node
After=network.target
[Service]
Type=forking
User=cubefs
Group=cubefs
ExecStart=/usr/local/bin/cfs-server -c /etc/cubefs/metanode.json
LimitNOFILE=102400000
[Install]
WantedBy=multi-user.target
cubefs-metanode.service
Now, edit /etc/cubefs/metanode.json
and fill it out as follows. You'll want to set "serviceIDKey" to the value of auth_id_key from /etc/cubefs/key_metanode.json
from your first Authnode.
{
"role": "metanode",
"listen": "17210",
"prof": "17220",
"logLevel": "info",
"metadataDir": "/var/lib/cubefs/metanode/data/meta",
"logDir": "/var/log/cubefs/metanode/log",
"raftDir": "/var/lib/cubefs/metanode/data/raft",
"raftHeartbeatPort": "17230",
"raftReplicaPort": "17240",
"totalMem": "8589934592",
"exporterPort": 9501,
"masterAddr": [
"10.0.20.101:17010",
"10.0.20.102:17010",
"10.0.20.103:17010"
],
"serviceIDKey": "replace-me-with-the-value-from-auth_id_key"
}
metanode.json
As an aside, you should read the following important notes from the CubeFS docs:
Notes
The configuration options listen, raftHeartbeatPort, and raftReplicaPort cannot be modified after the program is first configured and started.
The relevant configuration information is recorded in the constcfg file under the metadataDir directory. If you need to force modification, you need to manually delete the file.
The above three configuration options are related to the registration information of the MetaNode in the Master. If modified, the Master will not be able to locate the MetaNode information before the modification.
In other words, modifying the listen port or any of the Raft ports used by the node after deployment will cause issues. Basically, don't, and if you have to, be aware of the above.
If you have a large/alternative disk such as a dedicated NVMe drive you wish to use for metadata, mount it as /var/lib/cubefs/metanode/data/meta
prior to starting the metanode for the first time (if you do need to set that up later - stop the metanode, move /var/lib/cubefs/metanode/data/meta
to a new location, mkdir /var/lib/cubefs/metanode/data/meta
, mount your new drive, and then move the contents of the old folder back to the original location now that your new drive is mounted).
Finally, start all your metanodes.
systemctl enable --now cubefs-metanode
The final step before a working cluster.
If you see something like the following, you've done it - congratulations.
wings@blackberry:~$ cfs-cli metanode ls
[Meta nodes]
ID ADDRESS WRITABLE STATUS
6 10.0.20.15:17210 Yes Active
7 10.0.20.18:17210 Yes Active
8 10.0.20.16:17210 Yes Active
9 10.0.20.14:17210 Yes Active
You should have a fully functioning CubeFS installation.
Creating your first volume
It's time to try it out for real.
Create your first volume, specifying that you want:
- A volume called media
- Owned by media
- With follower-read=true (highly recommended for better performance)
- 2 replicas (in other words, 2x replication, two copies)
- A total volume capacity of 5000GB (5TB)
Feel free to tweak any of these as you wish, and run cfs-cli volume create --help
for more information.
wings@blackberry:~$ cfs-cli volume create media media --follower-read=true --replica-num 2 --capacity 5000
Create a new volume:
Name : media
Owner : media
capacity : 5000 G
deleteLockTime : 0 h
crossZone : false
DefaultPriority : false
description :
mpCount : 3
replicaNum : 2
size : 120 G
volType : 0
followerRead : true
readOnlyWhenFull : false
zoneName :
cacheRuleKey :
ebsBlkSize : 8388608 byte
cacheCapacity : 0 G
cacheAction : 0
cacheThreshold : 10485760 byte
cacheTTL : 30 day
cacheHighWater : 80
cacheLowWater : 60
cacheLRUInterval : 5 min
TransactionMask :
TransactionTimeout : 1 min
TxConflictRetryNum : 0
TxConflictRetryInterval : 0 ms
Confirm (yes/no)[yes]: yes
Create volume success.
Creating our first volume.
That's it! No, really, that's it. You're ready to mount it!
Mounting the filesystem
On the machine you wish to use CubeFS with, install CubeFS (you will not need a CubeFS user this time) and then install the following systemd unit file. This one is special - it is an aliased service unit, which will allow you to create arbitrary mounts using CubeFS as you wish.
You will create it as /etc/systemd/system/cubefs-mount@.service
- note the presence of the @ in aliased units.
[Unit]
Description=CubeFS Mount
After=local-fs.target
Wants=local-fs.target
After=network.target
Wants=network-online.target
After=network-online.target
After=nss-lookup.target
After=time-sync.target
[Service]
Type=forking
ExecStart=/usr/local/bin/cfs-client -c /etc/cubefs/client-%i.conf
IOAccounting=true
IOWeight=250
StartupIOWeight=100
Restart=on-failure
RestartSec=4
StartLimitInterval=20
StartLimitBurst=4
[Install]
WantedBy=multi-user.target
cubefs-mount@.service
Now that you've got your mount unit file created, you just need a configuration file.
Let's open up /etc/cubefs/client-media.conf
and create our first mount config:
{
"mountPoint": "/mnt/cubefs/media",
"volName": "media",
"owner": "media",
"masterAddr": "10.0.20.101:17010,10.0.20.102:17010,10.0.20.103:17010",
"logDir": "/var/log/cubefs/client/log",
"logLevel": "info",
"clientIDKey": "replace-me-with-the-value-from-auth_id_key"
}
client-media.conf
You'll need the value of auth_key_id from /etc/cubefs/key-client.json
on your first Authnode.
deletekey
instead of createkey
). Basically just create a new version of data_client.json
with a new name and a new ID inside it, and then generate a key as before. The contents of auth_id_key contains the ID you want to authenticate with, so you can authenticate with different keys. You can actually do this same trick for most of the components in the cluster, but beware of overcomplicating things.Without further ado...
riff@robot01:~$ sudo mkdir -p /mnt/cubefs/media ; sudo systemctl enable --now cubefs-mount@media
Created symlink /etc/systemd/system/multi-user.target.wants/cubefs-mount@media.service → /etc/systemd/system/cubefs-mount@.service.
riff@robot01:/etc/cubefs$ df -h /mnt/cubefs/media
Filesystem Size Used Avail Use% Mounted on
cubefs-media 4.9T 0 4.9T 0% /mnt/cubefs/media
CubeFS, finally mounted.
We have a working mount.
Done!
This intro post covers setting up a complete but basic CubeFS installation. Future posts will cover enabling the erasure coding and object storage subsystems, as well as integrating CubeFS into Kubernetes using the CSI driver.
If you enjoyed this post, check out the rest of my blog and maybe drop me a message via any of my contact details. Cheers!