NOTE: This is a longer explanation to a question I responded to on GitHub about dynamically adding listeners / services to Pingora.
I wanted to show the technique I’m using to dynamically manage Pingora LoadBalancer instances inside a general proxy / load balancer service I’m building. Pingora is a Rust based proxy library from Cloudflare that can be used to build high performance http load balancers and proxy services.
Pingora’s design prompts you to setup your process using a static Service graph that you build
during process startup. This similar to patterns found in Guava’s Service interface, for managing several logical asynchronous services inside a single process. You configure your various services (load balancers, proxy services, background health checks, etc), add them to a Server instance, and then start the Server which takes over the lifecycle of your services.
From Pingora’s getting started guide:
fn main() {
let mut server = Server::new(None).unwrap();
server.bootstrap();
let upstreams = LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443"]).unwrap();
let mut lb = http_proxy_service(&server.configuration, LB(Arc::new(upstreams)));
lb.add_tcp("0.0.0.0:6188");
server.add_service(lb);
server.run_forever();
}
The Server struct normally does a lot of heavy lifting for you:
- Clean process startup / shutdown, including handling the correct
Service dependency ordering.
- Managing each Service’s tokio
Runtime. The Server instance creates an individual Runtime instance for each Service that it manages.
- Zero downtime listener socket handoff: The
Server handles handing off listener file descriptors over a unix socket to a new process to support zero downtime upgrades. This is very similar to how something like Envoy proxy does online hot restarts.
One major problem: After setting up services, the API to start the Server returns a Rust “Never type”.
Once your program hands off control to Server#run_forever, you’re never getting it back.
impl Server {
/// Start the server using Self::run and default RunArgs.
/// This function will block forever until the server needs to quit. So this would be the last function to call for this object.
/// Note: this function may fork the process for daemonization, so any additional threads created before this function will be lost to any service logic once this function is called.
pub fn run_forever(self) -> !
}
It’s quite easy to forgo the Server type entirely, easily use all the Pingora services, and retain full control of your process. This gives you the ability to dynamically start / stop
services (including LoadBalancer instances), or do whatever else you want. It’s on you to manage clean shutdown, and provision tokio runtimes. You lose out on some of the other built-ins like zero downtime hot restarts, but in my case, it’s worth it.
Starting a LoadBalancer without a Server is pretty straightforward:
fn make_load_balancer() -> GenBackgroundService<LoadBalancer<RoundRobin>> {
let backends = Backends::new(Box::new(ResourceDiscovery));
let mut load_balancer = LoadBalancer::from_backends(backends);
let health_check = TcpHealthCheck::new();
load_balancer.set_health_check(health_check);
load_balancer.health_check_frequency = Some(Duration::from_secs(5));
load_balancer.update_frequency = Some(Duration::from_secs(30));
let load_balancer_service = background_service("health_check", load_balancer);
load_balancer_service
}
fn main() -> Result<(), anyhow::Error> {
// Manage our own tokio Runtime
let runtime = tokio::runtime::Runtime::new()
.expect("Could not start tokio runtime");
// Create a Vec of tokio task handles, so we can wait for
// them to finish during process shutdown
let mut tasks = Vec::new();
// Each service watches this common channel to trigger clean shutdown.
let (shutdown_tx, shutdown_rx) = tokio::sync::watch::channel(false);
let load_balancer = make_load_balancer();
// Start a load balancer on the tokio runtime ourselves
tasks.push(runtime.spawn(async move {
load_balancer.task()
.start(shutdown_rx.clone())
.await
}));
}
Starting a proxy service is also straightforward:
fn main() -> Result<(), anyhow::Error> {
// With no Server, we have to manage our own ServerConf
let server_config: Arc<ServerConf> = Arc::new(Default::default());
let mut proxy_service = http_proxy_service(&server_config, Proxy);
proxy_service.add_tcp("0.0.0.0:80");
// Start the http proxy on the tokio Runtime
tasks.push(runtime.spawn(async move {
proxy_service.start_service(None, shutdown_rx.clone(), 1)
.await
}));
}
Here’s an example where we start a new LoadBalancer service whenever we receive an event on a tokio signal (in this case,
a timer). This is the control loop running on main, that we use in place of Server#run_forever:
fn main() -> Result<(), anyhow::Error> {
let mut tasks = Vec::new();
let (shutdown_tx, shutdown_rx) = tokio::sync::watch::channel(false);
// Setup services, like above (load balancers, proxies, etc.)
// then proceed to main control loop.
runtime.block_on(async {
let mut interval = tokio::time::interval(Duration::from_secs(30));
loop {
tokio::select! {
// Wait for shutdown. Normally the Server handles all external signal handling for you.
_ = tokio::signal::ctrl_c() => {
tracing::info!("Got shutdown signal. Stopping");
// Trigger shutdown to all services
shutdown_tx.send(true)?;
// Join / wait for all tasks to stop
for task in tasks {
if let Err(err) = task.await {
tracing::error!("Join error during task shutdown: {:?}", err);
}
}
break;
}
// Contrived example: Making a LoadBalancer on a timer. In practice, you'd probably stash
// your LoadBalancer instances in a shared data structure, and start / stop them on whatever
// signal is meaningful for your service.
//
// This example uses a timer, but you can manage this any way you want.
_ = interval.tick() => {
tasks.push(runtime.spawn(async move {
make_load_balancer().task()
.start(shutdown_rx.clone())
.await;
}));
}
}
}
Ok(())
})
}
In real code, I run through a full reconciliation process inside the proxy process. The proxy calls
out to the control plane to fetch the list of configured load balancers, and starts / stops them when
they’re created or destroyed. This design allows me to host multiple LoadBalancer instances inside the same process, while still maintaining separate backend configurations for each distinct load balancer. The control tasks run on a separate tokio Runtime than the proxy and load balancer services that handle requests / responses.
Check out Pingora if you’re interested in a Rust based load balancer library.
I recently stood up a 42U server rack in my basement. Some people are into cars, I’m into servers. 😁

It’s a bit empty right now, but with plenty of room to expand for future projects! All the CAT6A
cables in my house terminate at the patch panel at the top of the rack, with a very basic router / switch
setup that plumbs it all together.
The thing I’m more excited about is the 3 node MINISFORUM UM750L mini PC cluster that I’ve started building on top of.

I’m programming a custom VM orchestration and control plane on the cluster called “lightbyte”. Currently, the system supports:
- Dynamically provisioning and placing Firecracker virtual machines on the worker nodes.
- Provisioning block storage volumes from rootfs images. These images are built via Nix, and automatically get deployed to the cluster during software updates.
- Scale groups, for dynamically scaling up multiple identical VMs. Sort of like EC2 autoscaling groups. Way overkill for a homelab, but why not?
- Rudimentary HTTP load balancer / proxy support via pingora. These can target scale groups or individual sets of VMs.
- Worker draining, to vacate all resources from a host node
Here I create a basic scale group with a configured rootfs image, then scale it up to 5 nodes.

Provisioning is all done via reconciliation controllers in the control plane server processes. Each resource
has its own separate controller that’s responsible for reconciliating desired resource state (stored across all 3
etcd nodes), against the current state of running resources on each worker.
Once VMs boot, they bridge to the physical network via a Linux bridge.

HTTP load balancers run inside guest VMs themselves, owned and managed by their own scale group. Once I implement DNS
in the cluster, my plan is to have DNS records that will round-robin across multiple proxy VMs, which themselves will proxy
to other underlying scale groups or collections of VMs. I dynamically push backend topology changes to the proxy guests
over a guest vsock socket server. This vsock socket allows the host machine to orchestrate the guest and push configuration changes without the guest needing to talk to the control plane directly.

Here are some of the software details:
- Implemented in Rust. Control plane, worker agents, guest agents, command line clients, etc.
- Control plane database is etcd, running on all 3 nodes, which stores all resource object state. I use etcd transactions to perform transactionally correct read-modify-write on the objects. Reconciliation controllers are leader-elected with etcd and executed on the control plane nodes. I use leader election to make sure there’s only 1 controller of each resource type running at a time.
- Workers all run NixOS. The entire system is deployed and updated with a single command via colmena.
- I’m using firecracker for the Vmm, which sits on top of Linux KVM for the hypervisor.
This has been a fun one to build. I’m looking forward to stabilizing things enough to use the system for all my
homelab services. It’s a little baby homelab cloud!
This is a blog series covering how to connect a firecracker VM to network block storage.
Read Part 1 here.
In part 1, we taught the Firecracker VMM how to perform block-based disk operations using crucible volumes
as our backing store. This helped us validate the connective interface between the existing Firecracker virtio
block device implementation, and the existing crucible Volume interface. It worked quite well (save the impedence between crucible’s use
of async rust, and firecracker choosing to avoid async rust in favor of blocking operations).
But we left a few things out:
- Runtime configuration of crucible volumes when firecracker VMs are started. We need to start a firecracker VM and configure our virtio block device in the existing firecracker VM configuration.
- Connecting to crucible volumes over network attached storage to the “downstairs” TCP servers that manage the underlying physical disks and serve up block operations. Our previous post only used an in-memory block structure.
- Correct disk volume metadata, such as disk size. We faked it with a dummy ext4 volume, but we need firecracker to correctly detect the volume size based on how the crucible volume is configured.
This gets us 90% of the way towards our desired goal: Having firecracker support remote network attached block devices.
Let’s fix these issue now!
Volume Configuration
Previously, we took the shortest path to getting something working: hardcoded crucible Volume building. Let’s add a
crucible based configuration structure to the vmm_config module, which we’ll use to build our volumes dynamically:
From firecracker/src/vmm/src/vmm_config/crucible.rs:
use serde::{Deserialize, Serialize};
/// Configure remote crucible block storage drives
#[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize)]
#[serde(tag = "type")]
pub enum CrucibleConfig {
/// Attach a crucible volume over the network to downstairs
/// targets.
Network {
/// List of host:port socket addresses for the downstairs volumes
downstairs_targets: Vec<String>,
/// Volume generation id. Used each time a block device is moved / reattached
/// to a virtual machine to prevent concurrent usage.
volume_generation: u64,
},
/// Attach a crucible volume with in-memory state
InMemory {
/// Size for each block.
block_size: u64,
/// Overall volume / disk size.
disk_size: usize,
},
}
We support attaching two different volume enumerations: Attached over the network, or in-memory. The crucible upstairs also supports
a “pseudo-file” BlockIO implementation that has overlapping functionality with the existing firecracker file-backed disks. We might add
this later, but let’s just stick with these two cases for now.
We add this config enum to the main BlockDeviceConfig structure, that directly interfaces with the user to configure the firecracker VM’s block storage. This is eventually
translated into a VirtioBlockConfig struct that gets used when we build our underlying disk.
From firecracker/src/vmm/src/vmm_config/drive.rs:
/// Use this structure to set up the Block Device before booting the kernel.
#[derive(Debug, Default, PartialEq, Eq, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
pub struct BlockDeviceConfig {
/// Unique identifier of the drive.
pub drive_id: String,
/// Part-UUID. Represents the unique id of the boot partition of this device. It is
/// optional and it will be used only if the `is_root_device` field is true.
pub partuuid: Option<String>,
/// If set to true, it makes the current device the root block device.
/// Setting this flag to true will mount the block device in the
/// guest under /dev/vda unless the partuuid is present.
pub is_root_device: bool,
/// If set to true, the drive will ignore flush requests coming from
/// the guest driver.
#[serde(default)]
pub cache_type: CacheType,
// VirtioBlock specific fields
/// If set to true, the drive is opened in read-only mode. Otherwise, the
/// drive is opened as read-write.
pub is_read_only: Option<bool>,
/// Path of the drive.
pub path_on_host: Option<String>,
/// Rate Limiter for I/O operations.
pub rate_limiter: Option<RateLimiterConfig>,
/// Crucible configuration.
/// Only set when io_engine is 'Crucible'
pub crucible: Option<CrucibleConfig>,
/// The type of IO engine used by the device.
// #[serde(default)]
// #[serde(rename = "io_engine")]
// pub file_engine_type: FileEngineType,
#[serde(rename = "io_engine")]
pub file_engine_type: Option<FileEngineType>,
// VhostUserBlock specific fields
/// Path to the vhost-user socket.
pub socket: Option<String>,
}
CrucibleEngine Over the Network
Let’s expand our CrucibleEngine implementation from before, and add support for constructing crucible remote network attached block volumes.
From firecracker/src/vmm/src/devices/virtio/block/virtio/io/crucible.rs:
impl CrucibleEngine {
/// Mount a network attached volume
pub fn with_network_volume(
rt: Arc<Runtime>,
options: CrucibleOpts,
extent_info: RegionExtentInfo,
volume_generation: u64,
) -> Result<Self, anyhow::Error> {
let block_size = extent_info.block_size;
let volume = rt.block_on(async {
Self::network_attached_downstairs_volume(options, extent_info, volume_generation).await
})?;
let mut buf = crucible::Buffer::new(1, block_size as usize);
Ok(Self {
volume,
rt,
block_size,
buf,
})
}
async fn network_attached_downstairs_volume(
options: CrucibleOpts,
extent_info: RegionExtentInfo,
volume_generation: u64,
) -> Result<Volume, anyhow::Error> {
let volume_logger = crucible_common::build_logger_with_level(slog::Level::Info);
let mut builder = VolumeBuilder::new(extent_info.block_size, volume_logger);
builder
.add_subvolume_create_guest(options.clone(), extent_info, volume_generation, None)
.await?;
let volume = Volume::from(builder);
info!(
"Successfully added volume from downstairs targets: {:?}",
options.target
);
// Before we use the volume, we must activate it, and ensure it's active
info!("Activating crucible volume");
volume.activate_with_gen(volume_generation).await?;
info!("Waiting to query the work queue before sending I/O");
volume.query_work_queue().await?;
let _ = Self::wait_for_active_upstairs(&volume).await?;
info!("Upstairs is active. Volume built and ready for I/O");
Ok(volume)
}
}
Rather than use the previous CrucibleEngine#with_in_memory_volume, we add a top-level constructor for CrucibleEngine#with_network_volume.
Breaking down the arguments:
Arc<Runtime>: The tokio runtime to use with volume operations. Again, firecracker doesn’t utilize async I/O, so we provide it for the CrucibleEngine.
CrucibleOpts: crucible upstairs / client configuration options. Most criticially, this includes our downstairs targets to connect to.
RegionExtentInfo: Metadata queried from the crucible downstairs repair port. Provides block_size, extent_count and blocks_per_extent, which can be used for overall volume size calculations.
volume_generation: Concurrency safety mechanism that prevents “split-brain” scenarios (multiple VMs mounting the same volume). The downstairs server will favor the highest generation counter, used in conjunction with a centralized control plane that increments the generation number each time a volume is moved or attached to a new VM.
Encapsulated FileEngine and Disk Properties
Firecracker uses the DiskProperties structure to both determine overall disk metadata such as the disk size, as well as build the FileEngine struct for block I/O.
We’ll kill two birds with one stone: Cleanup how our FileEngine gets built, but also return the correct disk size metadata to the virtio layer during boot.
Here’s our new revised DiskProperties code, that more cleanly supports the existing firecracker FileEngine, and our new crucible one. We revise the main entry point
to switch on engine type from the config:
From firecracker/src/vmm/src/devices/virtio/block/virtio/device.rs:
impl DiskProperties {
/// Create a new disk from the given VirtoioBlockConfig.
pub fn from_config(config: &VirtioBlockConfig) -> Result<Self, VirtioBlockError> {
match config.file_engine_type {
FileEngineType::Sync | FileEngineType::Async => Self::from_file(config.path_on_host.clone(), config.is_read_only, config.file_engine_type),
FileEngineType::Crucible => Self::from_crucible(&config.crucible.as_ref().expect("Crucible block device configuration must always be present in the 'crucible' field when file_engine_type is 'Crucible'")),
}
}
}
We renamed the previous DiskProperties::new function to DiskProperties#from_file, and added a new DiskProperties#from_crucible.
This now serves as the main entry-point into building crucible based volumes (both in-memory, as well as our new network attached). Let’s take a look here:
impl DiskProperties {
pub fn from_crucible(crucible_config: &CrucibleConfig) -> Result<Self, VirtioBlockError> {
// Firecracker doesn't use async rust or tokio, but crucible library operations
// depend on an async runtime. We might want to push this up the stack at some
// point.
let rt =
Arc::new(tokio::runtime::Runtime::new().expect("Could not construct a tokio runtime"));
let (disk_size, crucible_engine) = match crucible_config {
CrucibleConfig::Network {
downstairs_targets,
volume_generation,
} => {
let targets = downstairs_targets
.iter()
.map(|target| {
target.parse::<SocketAddr>().map_err(|err| {
error!(
"Error parsing crucible target: {}, error: {:?}",
target, err
);
VirtioBlockError::Config
})
})
.collect::<Result<Vec<SocketAddr>, VirtioBlockError>>()?;
let (region_extent_info, disk_size) = Self::volume_size(&rt, &targets)?;
let options = crucible_client_types::CrucibleOpts {
target: targets,
..Default::default()
};
let crucible_engine = CrucibleEngine::with_network_volume(
rt,
options,
region_extent_info,
*volume_generation,
)
.map_err(|err| VirtioBlockError::FileEngine(BlockIoError::Crucible(err)))?;
(disk_size, crucible_engine)
}
CrucibleConfig::InMemory {
block_size,
disk_size,
} => {
let crucible_engine =
CrucibleEngine::with_in_memory_volume(rt, *block_size, *disk_size)
.map_err(|err| VirtioBlockError::FileEngine(BlockIoError::Crucible(err)))?;
(*disk_size as u64, crucible_engine)
}
};
let mut image_id = [0; VIRTIO_BLK_ID_BYTES as usize];
let engine = FileEngine::Crucible(crucible_engine);
Ok(Self {
file_path: "".to_string(), // TODO: Remove file path
file_engine: engine,
nsectors: disk_size >> SECTOR_SHIFT,
image_id,
})
}
}
Breaking down the network case, the high level steps are:
- Lookup the volume metadata (region extent info) from the given downstairs servers. This is used to determine
block_size as well as overall disk size.
- Contstruct the underlying
CrucibleEngine, from the configuration options. This includes the downstairs target TCP servers, and the volume generation we configured before.
- Stub out a
image_id. We’ll eventually update this, especially if we want to attach multiple crucible volumes to the same VM.
There’s still some lingering coupling to the file based storage, with the file_path field property that’s not relevant in the case of crucible volumes.
Putting it Together
Let’s put it all together, and fire a VM up. First, let’s configure our machine to talk over the network. We’ll modify our previous firecracker VM machine configuration.
From firecracker/scripts/crucible_network.json:
{
"boot-source": {
"kernel_image_path": ".kernel/vmlinux-6.1.141.1",
"boot_args": "console=ttyS0 reboot=k panic=1 pci=off"
},
"logger": {
"log_path": "test_machine.log",
"level": "debug"
},
"drives": [
{
"drive_id": "rootfs",
"path_on_host": ".kernel/ubuntu-24.04.ext4",
"is_root_device": true,
"is_read_only": false
},
{
"drive_id": "storage",
"path_on_host": "storage.ext4",
"is_root_device": false,
"is_read_only": false,
"io_engine": "Crucible",
"crucible": {
"type": "Network",
"downstairs_targets": [
"127.0.0.1:3810",
"127.0.0.1:3820",
"127.0.0.1:3830"
],
"volume_generation": 1
}
}
],
"machine-config": {
"vcpu_count": 1,
"mem_size_mib": 1024
}
}
This configuration lays out 3 downstairs servers to connect in order to access volume block data, along with the volume generation. Volume generations always start at 1 and increment for each
volume move / attachment event.
Note that crucible downstairs volumes are always replicated, and each replica runs in an isolated process with its own socket address. In a multi-host setup, the control plane
would be responsible for starting / stopping these downstairs processes during volume provisioning.
Let’s manually provision each downstairs volume, and start 3 downstairs processes. We’ll make a 100MB volume replicated across all 3 downstairs servers. In your local crucible checkout:
# First, provision 3 100MB volumes in the data directory, each with their own unique UUID
# Overall volume size is calculated with: $block_size * $extent_size * $extent_count.
$ cargo run --bin crucible-downstairs -- create -d data/3810 -u $(uuidgen) --block-size 512 --extent-count 100 --extent-size 2048
$ cargo run --bin crucible-downstairs -- create -d data/3820 -u $(uuidgen) --block-size 512 --extent-count 100 --extent-size 2048
$ cargo run --bin crucible-downstairs -- create -d data/3830 -u $(uuidgen) --block-size 512 --extent-count 100 --extent-size 2048
# Now, in 3 separate terminal windows, start a server process for each volume downstairs.
$ cargo run --bin crucible-downstairs -- run -d data/3810 -p 3810
$ cargo run --bin crucible-downstairs -- run -d data/3820 -p 3820
$ cargo run --bin crucible-downstairs -- run -d data/3830 -p 3830
Our block storage backend / servers are ready. Let’s fire up our firecracker VM.
Back in the firecracker git checkout:
# Start a new firecracker VM, with the crucible_network.json VM configuration.
$ cargo run --bin firecracker -- --api-sock /tmp/fc0.sock --config-file ./scripts/crucible_network.json
Starting up the firecracker VM, we should see logs confirming correct conneciton to the crucible downstairs servers:
From test_machine.log:
2025-10-17T08:22:29.246964209 [anonymous-instance:main] Looking up region extent information from: http://127.0.0.1:7810
2025-10-17T08:22:29.248154545 [anonymous-instance:main] starting new connection: http://127.0.0.1:7810/
2025-10-17T08:22:29.250181813 [anonymous-instance:main] Remote region extent info from http://127.0.0.1:7810 is: RegionDefinition { block_size: 512, extent_size: Block { value: 100, shift: 9 }, extent_count: 2048, uuid: 282337c4-851e-4e3d-9b78-9cd984
7b0f53, encrypted: false, database_read_version: 1, database_write_version: 1 }
2025-10-17T08:22:29.253353020 [anonymous-instance:main] Successfully added volume from downstairs targets: [127.0.0.1:3810, 127.0.0.1:3820, 127.0.0.1:3830]
2025-10-17T08:22:29.253382907 [anonymous-instance:main] Activating crucible volume
2025-10-17T08:22:29.412650382 [anonymous-instance:main] Waiting to query the work queue before sending I/O
2025-10-17T08:22:29.412912033 [anonymous-instance:main] Upstairs is active. Volume built and ready for I/O
2025-10-17T08:22:29.782137041 [anonymous-instance:main] Crucible read. offset: 0, addr: GuestAddress(49954816), count: 4096
2025-10-17T08:22:31.053296057 [anonymous-instance:main] Crucible read. offset: 0, addr: GuestAddress(72134656), count: 4096
2025-10-17T08:22:31.056033575 [anonymous-instance:main] Crucible read. offset: 16384, addr: GuestAddress(62717952), count: 4096
2025-10-17T08:22:31.057059658 [anonymous-instance:main] Crucible read. offset: 32768, addr: GuestAddress(72376320), count: 4096
Notice that we connect to the “recovery port” for metadata lookup. This port is also running on each downstairs server, in addition to the main port for upstairs clients.
We’ve got good looking log lines, does the volume work?
Ubuntu 24.04.2 LTS ubuntu-fc-uvm ttyS0
ubuntu-fc-uvm login: root (automatic login)
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.1.141 x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
root@ubuntu-fc-uvm:~# lsblk
NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 254:0 0 1G 0 disk /
vdb 254:16 0 100M 0 disk
When we list block devices, we are still getting the correct block device size, even without the dummy .ext4 file. Let’s see if we can do some block operations:
root@ubuntu-fc-uvm:~# mkfs.ext4 /dev/vdb
mke2fs 1.47.0 (5-Feb-2023)
Creating filesystem with 25600 4k blocks and 25600 inodes
Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done
root@ubuntu-fc-uvm:~# mount -t ext4 /dev/vdb /mnt/storage/
root@ubuntu-fc-uvm:~# ls -lah /mnt/storage/
total 24K
drwxr-xr-x 3 root root 4.0K Oct 17 16:17 .
drwxr-xr-x 3 root root 4.0K Oct 11 22:15 ..
drwx------ 2 root root 16K Oct 17 16:17 lost+found
root@ubuntu-fc-uvm:~# echo "Hello network attached crucible!" > /mnt/storage/hello
root@ubuntu-fc-uvm:~# cat /mnt/storage/hello
Hello network attached crucible!
root@ubuntu-fc-uvm:~#
Woohoo! Not only are our block operations working, but we’re sending them over the network, using the very simple Crucible TCP protocol. Separation of compute and storage gives us flexibility and mobility
as we might move underlying hosts around in a larger VM infrastructure. In a production network, we’d want very high speed networking for block data operations.
Wrapping Up
In this 2-part series, we went from a stock firecracker source checkout, to plugging in crucible based network attached block devices. In crucible lingo, we connected our ‘upstairs’
firecracker VMM process to our ‘downstairs’ crucible TCP servers that manage the underlying durable storage on disks.
Where do we go from here? Here’s what’s on the top of my mind:
- Cleanup additional FileEngine coupling: There’s still some lingering coupling in the firecracker code. In our new setup, we can’t always assume we have a backing local VM file (the disk might be remote over the network). There’s some more work to do to cleanly abstract these pieces away.
- Convert
FileEngine to an open trait, rather than a closed enumeration. It would be easier to support pluggable disk backends with a pluggable trait that encapsulates all the necessary operations required for a block device backend. As such, there’s quite a few places scattered through the code that make assumptions on these closed enumerations.
- Extensive stress testing, especially for performance at high I/O rates.
- Wire this into a simple control plane, to support dynamically provisioning VMs and block volumes.
I’d like to send some of these patches upstream to firecracker, so it’s easier to support pluggable disk backends. In the meantime, I’ll maintain a branch
on GitHub that can track upstream.
UPDATE: There’s now a nix flake to build this custom version of firecracker. Try it out:
$ nix build 'github:blakesmith/firecracker/crucible-tracking'