Category: containers

Firecracker MicroVM Security

I wanted to get a better understanding of firecracker microVM security, from the bottom up. A few questions I had in mind –

a) how does firecracker design achieve a smaller threat surface than a typical vm/container ?

b) what mechanisms are available to secure code running in a microvm ?

c) and lastly, how can microvms change security considerations when deploying code for web services ?

The following design elements contribute to a smaller threat surface:

  • minimal design, in a memory safe, compact, readable rust language
  • minimal guest virtual device model: a network device, a block I/O device, a timer, a  KVM clock, a serial console, and a partial keyboard
  • minimal networking; from docs/vsock.md : “The Firecracker vsock device aims to provide full virtio-vsock support to software running inside the guest VM, while bypassing vhost kernel code on the host. To that end, Firecracker implements the virtio-vsock device model, and mediates communication between AF_UNIX sockets (on the host end) and AF_VSOCK sockets (on the guest end).”
  • static linking of the firecracker process limits dependancies
  • seccomp BPF limits the system calls to 35 allowed calls, 30 with simple filtering, 5 with advanced filtering that limits the call based on parameters (SeccompFilter::new call in vmm/src/default_syscalls/filters.rs, seccomp/src/lib.rs)

The production security setup recommends using jailer to apply isolation based on cgroups, namespaces, seccomp. These techniques are typical of container isolation and act in addition to KVM based isolation.

The Host Security Configuration recommends a series of checks to mitigate side-channel issues for a multi-tenant system:

  • Disable Simultaneous Multithreading (SMT)
  • Check Kernel Page-Table Isolation (KPTI) support
  • Disable Kernel Same-page Merging (KSM)
  • Check for speculative branch prediction issue mitigation
  • Apply L1 Terminal Fault (L1TF) mitigation
  • Apply Speculative Store Bypass (SSBD) mitigation
  • Use memory with Rowhammer mitigation support
  • Disable swapping to disk or enable secure swap

How is the firecracker process organized ? The docs/design.md has the following descriptions:

Internal Design: Each Firecracker process encapsulates one and only one microVM. The process runs the following threads: API, VMM and vCPU(s). The API thread is responsible for Firecracker’s API server and associated control plane. It’s never in the fast path of the virtual machine. The VMM thread exposes the machine model, minimal legacy device model, microVM metadata service (MMDS) and VirtIO device emulated Net and Block devices, complete with I/O rate limiting. In addition to them, there are one or more vCPU threads (one per guest CPU core). They are created via KVM and run the `KVM_RUN` main loop. They execute synchronous I/O and memory-mapped I/O operations on devices models. 

Threat Containment: From a security perspective, all vCPU threads are considered to be running malicious code as soon as they have been started; these malicious threads need to be contained. Containment is achieved by nesting several trust zones which increment from least trusted or least safe (guest vCPU threads) to most trusted or safest (host). These trusted zones are separated by barriers that enforce aspects of Firecracker security. For example, all outbound network traffic data is copied by the Firecracker I/O thread from the emulated network interface to the backing host TAP device, and I/O rate limiting is applied at this point.

As to question about mechanisms to secure code running in firecracker, the serverless environment, AWS Lambda, and its security best practices are a good place to start . A few resources on this are here, here, here and here. AWS API gateway supports input validation, as described here. However, while serverless reduces the attack surface, the web threats such as OWASP still apply and must be taken into account during design and testing.

For the last question – uVMs and serverless appear to offer a promising model to build a service incrementally from small secure building blocks – and this is something to explore further.

Linux Bridges and Firewalls for Containers

Hubs and Repeaters work on bits on Layer1. Bridges and Switches work on frames, the protocol data unit for Layer2. A Hub supports a star configuration – like a splitter it simply forwards traffic received on one segment to all the others. A Repeater amplifies signal for longer distances. A Bridge is a smarter type of Hub, usually implemented in software, which looks at the destination MAC address of the packet before forwarding to the segment with MAC matching that destination, thereby avoiding unnecessary network traffic on all the other segments. It does not look at IP addresses. A Switch is a reincarnation of a Bridge in hardware with several ports.  The basic operation of a Bridge/Switch is L2 forwarding, which includes ARL (Address Resolution Logic) table learning and ageing.  An L3 Router switches packets based on the IP address instead of the frame address.

A linux bridge from a VM to the Host, does not need an IP address, but it is a good idea if you want to talk to the host for management or set firewall rules on it.

To create a bridge br0 between two interfaces eth0 and eth1, one runs commands: brctl addbr br0, brctl stp br0 on, brctl addif br0 eth0, brctl addif br0 eth1.

Since a bridge is an L2 construct, how can one filter IP addresses with it. A quote: An Ethernet interface cannot usefully have an IP address if it is also attached to a bridge. However it is possible to achieve the same effect by binding an address to the bridge itself:

ifconfig br0 192.168.0.1

An example of firewall configuration on a bridge.

Troubleshooting bridges:

brctl showmacs br0

Docker creates a few bridges by default. A primer on Docker Container networking and the bridges it creates are here and  here. Here are three common ones.

bridge: docker0 bridge . This is the default, allowing all containers to talk to each other

host: container listens to host, without any isolation.

Some problems with host iptables rules reveal themselves in working with docker.

A look at firewalld, ebtables, ipset, nftables at https://www.slideshare.net/albertspijkers/red-hat-enterprise-linux-next-generation-firewall . The command line tools talk to the kernel over a netlink socket.

A more extensible approach is with Berkeley Packet Filter (BPF), the classical one now called cBPF, the extended successor called eBPF.   A very basic problem is filtering HTTP or DNS packets based on the destination hostname, rather than the IP address.  This is discussed in an example by Cloudflare here.