SmartOS Manifests

SmartOS Manifests

SmartOS manifests, also sometimes referred to as zone manifests, are JSON files that describe the resources and permissions to be allocated and granted to a given guest zone on SmartOS.  They are used by SmartOS global zones to instance discrete guest zones, either through Triton or directly on the global zone command-line through vmadm.

Since having the wrong manifest for a SmartOS zone can significantly impact its operation, it seems worth it to dedicate a full article to the topic.  We'll be looking at examples of the different types of SmartOS manifests, as well as exploring the specific properties of each.

Please note that the properties covered in this article are limited to ones I've found useful.  A more authoritative source of all of this information is probably the SmartOS wiki and the vmadm manual page.

I tend to keep my manifests on file in the global zone when running SmartOS as a stand-alone containerizor, usually under /usbkey/vmcfg/ or /usbkey/manifests/ for easy re-creation of zones.

Native Zones

Native SmartOS zones are an Illumos based isolated environment that passes everything through to the global zone kernel with no translation at all, except for the isolation provided by virtue of being a zone.

A slightly modified example manifest from the SmartOS wiki:

{
 "brand": "joyent",
 "image_uuid": "1d05e788-5409-11eb-b12f-037bd7fee4ee",
 "alias": "test-smartos",
 "hostname": "test-smartos",
 "cpu_cap": 200,
 "max_physical_memory": 1024,
 "quota": 20,
 "delegate_dataset": true,
 "resolvers": ["8.8.8.8", "208.67.220.220"],
 "nics": [
  {
   "nic_tag": "admin",
   "ips": ["dhcp"]
  }
 ]
}

For ease of reading, these properties tend to be grouped into the following general arrangements.  Properties marked with an asterisk (*) are required for each brand.

Administrative

  • brand*: String that must be set to joyent or joyent-minimal for this zone type.
  • image_uuid: String representing the image UUID that this zone should be instanced from.  Images are managed by imgadm.
  • alias: String used for display/lookup purposes from outside the guest zone.
  • hostname: String used to configure the guest zone's hostname on creation.

CPU/Process

  • cpu_cap: Integer representing the percentage of each CPU core available to this zone.  A value of 300 represents up to 3 full CPU cores.
  • cpu_shares: Integer representing the number of fair share scheduler (FSS) shares for this zone.  Only meaningful relative to other zones on the system and only applies when there is CPU contention between zones.  A value of 25 will mean this zone will only have access to \(\frac{1}{4}\) as much CPU time as another zone with the default value of 20.
  • max_lwps: Integer representing the maximum number of threads a zone is allowed to run.  The default value of 2000 should be pretty reasonable.

Memory

  • max_locked_memory: Integer representing the number of MiB of memory this zone is allowed to lock.  Locked memory are pages that are explicitly marked as non-swappable, and cannot exceed its default value of max_physical_memory.
  • max_physical_memory: Integer representing the number of MiB of memory this zone is allowed to use.  The default value is 256.
  • max_swap: Integer representing the number of MiB of virtual memory this zone is allowed to use.  This value must be greater than its default value of max_physical_memory or 256 whichever is greater.

Storage

  • quota: Integer representing the number of GiB that this zone ZFS dataset should have its quota set to.
  • delegate_dataset: Boolean that determines if a ZFS dataset will be delegated to this zone on creation.  If set to true, this zone will get a dataset at <zoneroot dataset>/data (default of: zones/<uuid>/data.)  This dataset can be configured many different ways to optimize for databases, snapshots, etc.
  • indestructible_delegated: Boolean that determines if the delegated ZFS dataset should have a zfs hold set on it to enable two-step deletion.  Use this if you're really unsure about accidentally deleting your data.
  • indestructible_zoneroot: Same as above but for the entire guest zone.
  • filesystems: Array of JSON objects representing additional filesystems that would be outside of normal operation to be mounted within zones.  Below are the required parameters:
  • filesystem.*.type: String representing the type of filesystem to be mounted, lofs for a bind mount, pcfs for a pc filesystem, tmpfs etc.
  • filesystem.*.source: String representing the source directory from the scope of the global zone, primarily useful for lofs mounts.
  • filesystem.*.target: String representing the mountpoint from the scope of the guest zone.
  • filesystem.*.raw: String representing a raw device to be associated with the source filesystem, most often, this should be a device file for a drive.
  • filesystems.*.options: Array of strings representing the mount options for this filesystem when it is mounted into the zone.  Eg: ["ro", "nodevices"]
  • fs_allowed: String representing filesystem types this zone is allowed to mount.  If you're building SmartOS, you will want this as: "ufs,pcfs,tmpfs"
  • tmpfs: Integer representing the number of MiB this zone is allowed to use for its tmpfs mounted at /tmp.  Cannot exceed its default value of max_physical_memory.
  • zfs_filesystem_limit, zfs_snapshot_limit: Integers representing the limits on the number of ZFS filesystems and snapshots a zone can have.  Useful when combined with delegate_dataset to prevent runaway resource consumption.
  • zfs_io_priority: Integer representing the zone's IO priority when operating on a system with IO contention.  Zones with values less than (or greater than) the default value of 100 will have their IO throttled (or prioritized) when both try to use all available storage IO.

Network

  • resolvers: Array of strings representing DNS resolvers that will be assigned to /etc/resolv.conf upon zone creation.
  • maintain_resolvers: Boolean that determines if vmadm should update guest zone resolvers when the above property is updated.  Default: false
  • nics: Array of JSON objects representing a guest zone's network interfaces.  Below are the required parameters:
  • nics.*.primary: Boolean representing which vnic should be used for this zone's default gateway and nameserver values.  Only useful with multiple nics.
  • nics.*.nic_tag: String representing which physical nic or etherstub that this vnic should be associated with.
  • nics.*.vlan_id: Integer representing what vlan tag should be used for this vnic.
  • nics.*.interface: String representing the interface name this zone will use for this interface.  Always in the format of netX where  X is an integer \(\geq 0\).  This parameter is primarily useful for configuring zones with multiple nics.
  • nics.*.mac: String representing the MAC address of a vnic.  This is useful when interfacing with external systems expecting a specific MAC address.
  • nics.*.ips: Array of strings representing IPv4 CIDR or IPv6 CIDR addresses for a given vnic.  The special strings "dhcp" and "addrconf" can be used as well to represent the use of DHCPv4 and SLAAC or DHCPv6, respectively.
  • nics.*.gateways: Array of strings representing IPv4 addresses that this zone should use as network gateways.  If multiple gateways are specified, OS-specific behavior will apply (eg round robin on SmartOS).  Not required if using DHCP.
  • nics.*.routes: JSON object that maps network destinations to gateways.  Destinations (keys) can be either IP addresses or IP Subnetworks in CIDR notation.  Gateways can be either IP addresses or in the form of nics[0] or macs[aa:bb:cc:12:34:56].
  • nics.*.allow_dhcp_spoofing, nics.*.allow_ip_spoofing: Booleans that determine if this zone vnic should be granted certain permissions.  DHCP spoofing is required for DHCP servers.  IP spoofing is required for routers.
  • nics.*.allowed_ips: Array of strings representing additional IP addresses from which this vnic is allowed to send traffic.  This is useful for IP address failover schemes between multiple zones.
  • nics.*.blocked_outgoing_ports: Array of integers representing port numbers to which this vnic is prevented from sending traffic.  Eg: [80, 443, 8080]

Additional Properties

  • limit_priv: String representing the list of privileges that will be available to this zone.  The default is normally fine, but some applications may require special permissions to run properly, for instance FreeSwitch apparently needs "default,proc_clock_highres,proc_priocntl" to enable the use of high resolution timers with very small time values and for better control over its scheduling class, both probably important for low latency voice.  See man 5 privileges.
  • customer_metadata: JSON object representing metadata to be associated with this VM.  This data can be accessed from within the guest zone by using the mdata-get command, even through this object, eg:
"customer_metadata": {
 "root_authorized_keys": "ssh-ed25519 <key data>",
 "user-script": "/usr/sbin/mdata-get root_authorized_keys > /root/.ssh/authorized_keys"
}

Linux Branded Guest Manifests

Linux Branded SmartOS zones are a Linux user-space with an additional translation layer that converts Linux ABI calls from the user-space into Illumos ABI calls before passing them on to the Illumos kernel, effectively allowing Linux user applications to operate under an Illumos kernel.

A slightly modified example manifest from the SmartOS wiki:

{
 "brand": "lx",
 "kernel_version": "4.2.0",
 "image_uuid": "63d6e664-3f1f-11e8-aef6-a3120cf8dd9d",
 "alias": "test-debian9",
 "hostname": "test-debian9",
 "cpu_cap": 400,
 "max_physical_memory": 4096,
 "quota": 1000,
 "resolvers": ["192.168.180.1", "8.8.8.8"],
 "nics": [
  {
   "nic_tag": "external",
   "vlan_id": 180,
   "ips": ["192.168.180.182/24"],
   "gateways": ["192.168.180.1"]
  }
 ]
}

The properties of Linux branded zones are almost identical to SmartOS zones, with the following differences:

Administrative

  • brand*: String that must be set to lx for this zone type.
  • kernel_version: String representing the version of Linux to report/emulate.

As of January 2021, not all ABI functionality of the latest Linux kernels is supported by the Linux translation layer, meaning that many modern distributions fail to function correctly.  This is being worked on.

HVM Guest Manifests

Hardware Virtual Machine (HVM) Guest zones contain a hardware virtualization suite utilizing either KVM or Bhyve to emulate hardware for any operating system that can run as a guest.

A slightly modified example manifest from the SmartOS wiki:

{
 "brand": "bhyve",
 "alias": "test-debian10",
 "hostname": "test-debian10",
 "vcpus": 4,
 "ram": 4096,
 "disks": [
  {
   "image_uuid": "9bcfe5cc-007d-4f23-bc8a-7e7b4d0c537e",
   "model": "virtio",
   "boot": true
  }
 ],
 "resolvers": ["208.67.222.222", "8.8.4.4"],
 "nics": [
  {
   "nic_tag": "admin",
   "ips": ["10.33.33.33/24"],
   "gateways": ["10.33.33.1"],
   "model": "virtio",
   "primary": true
  }
 ]
}

While there's quite a bit of divergence between the properties of OS (joyent and lx branded zones) and HVM (kvm and bhyve branded zones), most of the OS properties actually still apply, only to the zone performing the virtualization, not to the guest.

Please also note that some of these properties are specific to bhyve while others are specific to kvm.  I will try to illustrate which is which below:

Administrative

  • brand: String representing which hardware virtualization suite to use for this VM.  Must be either kvm or bhyve.
  • bhyve_extra_opts, qemu_extra_opts: Strings representing additional bhyve and kvm command-line parameters to be appended to the end of the commands.  While this was intended for debugging, it's also generally useful.
  • boot: String representing the boot order for kvm VMs. Expected format is order=X* where X is either c for the hard drive, d for the first CD-ROM drive, and n for network boot.  eg: order=cdn would boot from the hard drive, CD-ROM drive, and network, in that order.
  • bootrom: String representing the bootrom to use under bhyve.  Values are either bios, uefi or a path to a custom bootrom binary relative to the guest zone root.

CPU/Process

  • vcpus: Integer representing the number of virtual CPUs the guest will see.  This property can be used with cpu_cap and cpu_shares to more closely control CPU utilization.

Memory

  • ram: Integer representing the number of MiB of memory that will be made available to the guest kernel.  This should be used in place of max_physical_memory as it will need to allocate additional memory to handle the requirements of bhyve or qemu.

Storage

  • disks: Array of JSON objects representing disks that should be associated with this VM.
  • disks.*.block_size: Integer representing the block size of the disk.  This property can only be set during disk creation, and cannot be set when cloning a disk.
  • disks.*.boot: Boolean representing if this disk should be bootable.
  • disks.*.guest_block_size: String representing the device block size reported to the guest.  By default, the block size of the underlying device is reported to the guest.  This setting will override the default value.  It also supports reporting of both physical and logical block sizes using a string in the form of "logical size/physical size", eg: "512/4096" to look like a 512e drive.  Values must always be powers of 2.
  • disks.*.image_uuid: String representing the dataset from which to clone this disk.  These images are managed by imgadm.
  • disks.*.refreservation: Integer representing the size of this refreservation in MiB.
  • disks.*.size: Integer representing the size of this disk in MiB.  This property is mutually exclusive from image_uuid, and is useful for creating empty disks.
  • disks.*.media: String representing whether this disk is a "disk" or a "cdrom".
  • disks.*.model: String representing the driver that should be used by the guest to access this disk.  Should be one of "virtio", "ide" or "scsi".
  • disk_driver: String representing the default values for disks.*.model above.
  • flexible_disk_size: Integer representing the number of MiB of storage space that a bhyve instance may use for its disks and snapshots of those disks.  This value should be larger than \(\sum_{d}\).

Network

  • nics.*.allow_unfiltered_promisc: Boolean representing if this guest should be able to utilize multiple MAC addresses, eg: running SmartOS with vnics.  Really only suitable for testing containerizors from within a VM.
  • nics.*.model: String representing the driver that should be used by the guest to access this vnic.  Should be one of "virtio", "e1000" or "rtl8139".
  • nic_driver: String representing the default values for nics.*.model above.

Additional Properties

  • vnc_port: Integer representing the TCP port that the VNC server attached to this VM will listen on.  0 (default) will choose a port at random, -1 will disable VNC server.
  • vnc_password: String representing the password which will be required when authenticating to the VNC server.  This password will be visible from the global zone, and is limited to a maximum of 8 characters.