This is a blog series covering how to connect a firecracker VM to network block storage. Read Part 1 here.
In part 1, we taught the Firecracker VMM how to perform block-based disk operations using crucible volumes
as our backing store. This helped us validate the connective interface between the existing Firecracker virtio
block device implementation, and the existing crucible Volume
interface. It worked quite well (save the impedence between crucible’s use
of async rust, and firecracker choosing to avoid async rust in favor of blocking operations).
But we left a few things out:
- Runtime configuration of crucible volumes when firecracker VMs are started. We need to start a firecracker VM and configure our virtio block device in the existing firecracker VM configuration.
- Connecting to crucible volumes over network attached storage to the “downstairs” TCP servers that manage the underlying physical disks and serve up block operations. Our previous post only used an in-memory block structure.
- Correct disk volume metadata, such as disk size. We faked it with a dummy ext4 volume, but we need firecracker to correctly detect the volume size based on how the crucible volume is configured.
This gets us 90% of the way towards our desired goal: Having firecracker support remote network attached block devices.
Let’s fix these issue now!
Volume Configuration
Previously, we took the shortest path to getting something working: hardcoded crucible Volume
building. Let’s add a
crucible based configuration structure to the vmm_config
module, which we’ll use to build our volumes dynamically:
From firecracker/src/vmm/src/vmm_config/crucible.rs:
use serde::{Deserialize, Serialize};
/// Configure remote crucible block storage drives
#[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize)]
#[serde(tag = "type")]
pub enum CrucibleConfig {
/// Attach a crucible volume over the network to downstairs
/// targets.
Network {
/// List of host:port socket addresses for the downstairs volumes
downstairs_targets: Vec<String>,
/// Volume generation id. Used each time a block device is moved / reattached
/// to a virtual machine to prevent concurrent usage.
volume_generation: u64,
},
/// Attach a crucible volume with in-memory state
InMemory {
/// Size for each block.
block_size: u64,
/// Overall volume / disk size.
disk_size: usize,
},
}
We support attaching two different volume enumerations: Attached over the network, or in-memory. The crucible upstairs also supports
a “pseudo-file” BlockIO
implementation that has overlapping functionality with the existing firecracker file-backed disks. We might add
this later, but let’s just stick with these two cases for now.
We add this config enum to the main BlockDeviceConfig
structure, that directly interfaces with the user to configure the firecracker VM’s block storage. This is eventually
translated into a VirtioBlockConfig
struct that gets used when we build our underlying disk.
From firecracker/src/vmm/src/vmm_config/drive.rs:
/// Use this structure to set up the Block Device before booting the kernel.
#[derive(Debug, Default, PartialEq, Eq, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
pub struct BlockDeviceConfig {
/// Unique identifier of the drive.
pub drive_id: String,
/// Part-UUID. Represents the unique id of the boot partition of this device. It is
/// optional and it will be used only if the `is_root_device` field is true.
pub partuuid: Option<String>,
/// If set to true, it makes the current device the root block device.
/// Setting this flag to true will mount the block device in the
/// guest under /dev/vda unless the partuuid is present.
pub is_root_device: bool,
/// If set to true, the drive will ignore flush requests coming from
/// the guest driver.
#[serde(default)]
pub cache_type: CacheType,
// VirtioBlock specific fields
/// If set to true, the drive is opened in read-only mode. Otherwise, the
/// drive is opened as read-write.
pub is_read_only: Option<bool>,
/// Path of the drive.
pub path_on_host: Option<String>,
/// Rate Limiter for I/O operations.
pub rate_limiter: Option<RateLimiterConfig>,
/// Crucible configuration.
/// Only set when io_engine is 'Crucible'
pub crucible: Option<CrucibleConfig>,
/// The type of IO engine used by the device.
// #[serde(default)]
// #[serde(rename = "io_engine")]
// pub file_engine_type: FileEngineType,
#[serde(rename = "io_engine")]
pub file_engine_type: Option<FileEngineType>,
// VhostUserBlock specific fields
/// Path to the vhost-user socket.
pub socket: Option<String>,
}
CrucibleEngine Over the Network
Let’s expand our CrucibleEngine
implementation from before, and add support for constructing crucible remote network attached block volumes.
From firecracker/src/vmm/src/devices/virtio/block/virtio/io/crucible.rs:
impl CrucibleEngine {
/// Mount a network attached volume
pub fn with_network_volume(
rt: Arc<Runtime>,
options: CrucibleOpts,
extent_info: RegionExtentInfo,
volume_generation: u64,
) -> Result<Self, anyhow::Error> {
let block_size = extent_info.block_size;
let volume = rt.block_on(async {
Self::network_attached_downstairs_volume(options, extent_info, volume_generation).await
})?;
let mut buf = crucible::Buffer::new(1, block_size as usize);
Ok(Self {
volume,
rt,
block_size,
buf,
})
}
async fn network_attached_downstairs_volume(
options: CrucibleOpts,
extent_info: RegionExtentInfo,
volume_generation: u64,
) -> Result<Volume, anyhow::Error> {
let volume_logger = crucible_common::build_logger_with_level(slog::Level::Info);
let mut builder = VolumeBuilder::new(extent_info.block_size, volume_logger);
builder
.add_subvolume_create_guest(options.clone(), extent_info, volume_generation, None)
.await?;
let volume = Volume::from(builder);
info!(
"Successfully added volume from downstairs targets: {:?}",
options.target
);
// Before we use the volume, we must activate it, and ensure it's active
info!("Activating crucible volume");
volume.activate_with_gen(volume_generation).await?;
info!("Waiting to query the work queue before sending I/O");
volume.query_work_queue().await?;
let _ = Self::wait_for_active_upstairs(&volume).await?;
info!("Upstairs is active. Volume built and ready for I/O");
Ok(volume)
}
}
Rather than use the previous CrucibleEngine#with_in_memory_volume
, we add a top-level constructor for CrucibleEngine#with_network_volume
.
Breaking down the arguments:
Arc<Runtime>
: The tokio runtime to use with volume operations. Again, firecracker doesn’t utilize async I/O, so we provide it for theCrucibleEngine
.CrucibleOpts
: crucible upstairs / client configuration options. Most criticially, this includes our downstairs targets to connect to.RegionExtentInfo
: Metadata queried from the crucible downstairs repair port. Providesblock_size
,extent_count
andblocks_per_extent
, which can be used for overall volume size calculations.volume_generation
: Concurrency safety mechanism that prevents “split-brain” scenarios (multiple VMs mounting the same volume). The downstairs server will favor the highest generation counter, used in conjunction with a centralized control plane that increments the generation number each time a volume is moved or attached to a new VM.
Encapsulated FileEngine and Disk Properties
Firecracker uses the DiskProperties
structure to both determine overall disk metadata such as the disk size, as well as build the FileEngine
struct for block I/O.
We’ll kill two birds with one stone: Cleanup how our FileEngine
gets built, but also return the correct disk size metadata to the virtio layer during boot.
Here’s our new revised DiskProperties
code, that more cleanly supports the existing firecracker FileEngine
, and our new crucible one. We revise the main entry point
to switch on engine type from the config:
From firecracker/src/vmm/src/devices/virtio/block/virtio/device.rs:
impl DiskProperties {
/// Create a new disk from the given VirtoioBlockConfig.
pub fn from_config(config: &VirtioBlockConfig) -> Result<Self, VirtioBlockError> {
match config.file_engine_type {
FileEngineType::Sync | FileEngineType::Async => Self::from_file(config.path_on_host.clone(), config.is_read_only, config.file_engine_type),
FileEngineType::Crucible => Self::from_crucible(&config.crucible.as_ref().expect("Crucible block device configuration must always be present in the 'crucible' field when file_engine_type is 'Crucible'")),
}
}
}
We renamed the previous DiskProperties::new
function to DiskProperties#from_file
, and added a new DiskProperties#from_crucible
.
This now serves as the main entry-point into building crucible based volumes (both in-memory, as well as our new network attached). Let’s take a look here:
impl DiskProperties {
pub fn from_crucible(crucible_config: &CrucibleConfig) -> Result<Self, VirtioBlockError> {
// Firecracker doesn't use async rust or tokio, but crucible library operations
// depend on an async runtime. We might want to push this up the stack at some
// point.
let rt =
Arc::new(tokio::runtime::Runtime::new().expect("Could not construct a tokio runtime"));
let (disk_size, crucible_engine) = match crucible_config {
CrucibleConfig::Network {
downstairs_targets,
volume_generation,
} => {
let targets = downstairs_targets
.iter()
.map(|target| {
target.parse::<SocketAddr>().map_err(|err| {
error!(
"Error parsing crucible target: {}, error: {:?}",
target, err
);
VirtioBlockError::Config
})
})
.collect::<Result<Vec<SocketAddr>, VirtioBlockError>>()?;
let (region_extent_info, disk_size) = Self::volume_size(&rt, &targets)?;
let options = crucible_client_types::CrucibleOpts {
target: targets,
..Default::default()
};
let crucible_engine = CrucibleEngine::with_network_volume(
rt,
options,
region_extent_info,
*volume_generation,
)
.map_err(|err| VirtioBlockError::FileEngine(BlockIoError::Crucible(err)))?;
(disk_size, crucible_engine)
}
CrucibleConfig::InMemory {
block_size,
disk_size,
} => {
let crucible_engine =
CrucibleEngine::with_in_memory_volume(rt, *block_size, *disk_size)
.map_err(|err| VirtioBlockError::FileEngine(BlockIoError::Crucible(err)))?;
(*disk_size as u64, crucible_engine)
}
};
let mut image_id = [0; VIRTIO_BLK_ID_BYTES as usize];
let engine = FileEngine::Crucible(crucible_engine);
Ok(Self {
file_path: "".to_string(), // TODO: Remove file path
file_engine: engine,
nsectors: disk_size >> SECTOR_SHIFT,
image_id,
})
}
}
Breaking down the network case, the high level steps are:
- Lookup the volume metadata (region extent info) from the given downstairs servers. This is used to determine
block_size
as well as overall disk size. - Contstruct the underlying
CrucibleEngine
, from the configuration options. This includes the downstairs target TCP servers, and the volume generation we configured before. - Stub out a
image_id
. We’ll eventually update this, especially if we want to attach multiple crucible volumes to the same VM.
There’s still some lingering coupling to the file based storage, with the file_path
field property that’s not relevant in the case of crucible volumes.
Putting it Together
Let’s put it all together, and fire a VM up. First, let’s configure our machine to talk over the network. We’ll modify our previous firecracker VM machine configuration.
From firecracker/scripts/crucible_network.json:
{
"boot-source": {
"kernel_image_path": ".kernel/vmlinux-6.1.141.1",
"boot_args": "console=ttyS0 reboot=k panic=1 pci=off"
},
"logger": {
"log_path": "test_machine.log",
"level": "debug"
},
"drives": [
{
"drive_id": "rootfs",
"path_on_host": ".kernel/ubuntu-24.04.ext4",
"is_root_device": true,
"is_read_only": false
},
{
"drive_id": "storage",
"path_on_host": "storage.ext4",
"is_root_device": false,
"is_read_only": false,
"io_engine": "Crucible",
"crucible": {
"type": "Network",
"downstairs_targets": [
"127.0.0.1:3810",
"127.0.0.1:3820",
"127.0.0.1:3830"
],
"volume_generation": 1
}
}
],
"machine-config": {
"vcpu_count": 1,
"mem_size_mib": 1024
}
}
This configuration lays out 3 downstairs servers to connect in order to access volume block data, along with the volume generation. Volume generations always start at 1 and increment for each volume move / attachment event.
Note that crucible downstairs volumes are always replicated, and each replica runs in an isolated process with its own socket address. In a multi-host setup, the control plane would be responsible for starting / stopping these downstairs processes during volume provisioning.
Let’s manually provision each downstairs volume, and start 3 downstairs processes. We’ll make a 100MB volume replicated across all 3 downstairs servers. In your local crucible checkout:
# First, provision 3 100MB volumes in the data directory, each with their own unique UUID
# Overall volume size is calculated with: $block_size * $extent_size * $extent_count.
$ cargo run --bin crucible-downstairs -- create -d data/3810 -u $(uuidgen) --block-size 512 --extent-count 100 --extent-size 2048
$ cargo run --bin crucible-downstairs -- create -d data/3820 -u $(uuidgen) --block-size 512 --extent-count 100 --extent-size 2048
$ cargo run --bin crucible-downstairs -- create -d data/3830 -u $(uuidgen) --block-size 512 --extent-count 100 --extent-size 2048
# Now, in 3 separate terminal windows, start a server process for each volume downstairs.
$ cargo run --bin crucible-downstairs -- run -d data/3810 -p 3810
$ cargo run --bin crucible-downstairs -- run -d data/3820 -p 3820
$ cargo run --bin crucible-downstairs -- run -d data/3830 -p 3830
Our block storage backend / servers are ready. Let’s fire up our firecracker VM.
Back in the firecracker git checkout:
# Start a new firecracker VM, with the crucible_network.json VM configuration.
$ cargo run --bin firecracker -- --api-sock /tmp/fc0.sock --config-file ./scripts/crucible_network.json
Starting up the firecracker VM, we should see logs confirming correct conneciton to the crucible downstairs servers:
From test_machine.log
:
2025-10-17T08:22:29.246964209 [anonymous-instance:main] Looking up region extent information from: http://127.0.0.1:7810
2025-10-17T08:22:29.248154545 [anonymous-instance:main] starting new connection: http://127.0.0.1:7810/
2025-10-17T08:22:29.250181813 [anonymous-instance:main] Remote region extent info from http://127.0.0.1:7810 is: RegionDefinition { block_size: 512, extent_size: Block { value: 100, shift: 9 }, extent_count: 2048, uuid: 282337c4-851e-4e3d-9b78-9cd984
7b0f53, encrypted: false, database_read_version: 1, database_write_version: 1 }
2025-10-17T08:22:29.253353020 [anonymous-instance:main] Successfully added volume from downstairs targets: [127.0.0.1:3810, 127.0.0.1:3820, 127.0.0.1:3830]
2025-10-17T08:22:29.253382907 [anonymous-instance:main] Activating crucible volume
2025-10-17T08:22:29.412650382 [anonymous-instance:main] Waiting to query the work queue before sending I/O
2025-10-17T08:22:29.412912033 [anonymous-instance:main] Upstairs is active. Volume built and ready for I/O
2025-10-17T08:22:29.782137041 [anonymous-instance:main] Crucible read. offset: 0, addr: GuestAddress(49954816), count: 4096
2025-10-17T08:22:31.053296057 [anonymous-instance:main] Crucible read. offset: 0, addr: GuestAddress(72134656), count: 4096
2025-10-17T08:22:31.056033575 [anonymous-instance:main] Crucible read. offset: 16384, addr: GuestAddress(62717952), count: 4096
2025-10-17T08:22:31.057059658 [anonymous-instance:main] Crucible read. offset: 32768, addr: GuestAddress(72376320), count: 4096
Notice that we connect to the “recovery port” for metadata lookup. This port is also running on each downstairs server, in addition to the main port for upstairs clients.
We’ve got good looking log lines, does the volume work?
Ubuntu 24.04.2 LTS ubuntu-fc-uvm ttyS0
ubuntu-fc-uvm login: root (automatic login)
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.1.141 x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
root@ubuntu-fc-uvm:~# lsblk
NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 254:0 0 1G 0 disk /
vdb 254:16 0 100M 0 disk
When we list block devices, we are still getting the correct block device size, even without the dummy .ext4 file. Let’s see if we can do some block operations:
root@ubuntu-fc-uvm:~# mkfs.ext4 /dev/vdb
mke2fs 1.47.0 (5-Feb-2023)
Creating filesystem with 25600 4k blocks and 25600 inodes
Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done
root@ubuntu-fc-uvm:~# mount -t ext4 /dev/vdb /mnt/storage/
root@ubuntu-fc-uvm:~# ls -lah /mnt/storage/
total 24K
drwxr-xr-x 3 root root 4.0K Oct 17 16:17 .
drwxr-xr-x 3 root root 4.0K Oct 11 22:15 ..
drwx------ 2 root root 16K Oct 17 16:17 lost+found
root@ubuntu-fc-uvm:~# echo "Hello network attached crucible!" > /mnt/storage/hello
root@ubuntu-fc-uvm:~# cat /mnt/storage/hello
Hello network attached crucible!
root@ubuntu-fc-uvm:~#
Woohoo! Not only are our block operations working, but we’re sending them over the network, using the very simple Crucible TCP protocol. Separation of compute and storage gives us flexibility and mobility as we might move underlying hosts around in a larger VM infrastructure. In a production network, we’d want very high speed networking for block data operations.
Wrapping Up
In this 2-part series, we went from a stock firecracker source checkout, to plugging in crucible based network attached block devices. In crucible lingo, we connected our ‘upstairs’ firecracker VMM process to our ‘downstairs’ crucible TCP servers that manage the underlying durable storage on disks.
Where do we go from here? Here’s what’s on the top of my mind:
- Cleanup additional FileEngine coupling: There’s still some lingering coupling in the firecracker code. In our new setup, we can’t always assume we have a backing local VM file (the disk might be remote over the network). There’s some more work to do to cleanly abstract these pieces away.
- Convert
FileEngine
to an open trait, rather than a closed enumeration. It would be easier to support pluggable disk backends with a pluggable trait that encapsulates all the necessary operations required for a block device backend. As such, there’s quite a few places scattered through the code that make assumptions on these closed enumerations. - Extensive stress testing, especially for performance at high I/O rates.
- Wire this into a simple control plane, to support dynamically provisioning VMs and block volumes.
I’d like to send some of these patches upstream to firecracker, so it’s easier to support pluggable disk backends. In the meantime, I’ll maintain a branch on GitHub that can track upstream.