SmartOS Manifests
SmartOS manifests, also sometimes referred to as zone manifests, are JSON files that describe the resources and permissions to be allocated and granted to a given guest zone on SmartOS. They are used by SmartOS global zones to instance discrete guest zones, either through Triton or directly on the global zone command-line through vmadm
.
Since having the wrong manifest for a SmartOS zone can significantly impact its operation, it seems worth it to dedicate a full article to the topic. We'll be looking at examples of the different types of SmartOS manifests, as well as exploring the specific properties of each.
Please note that the properties covered in this article are limited to ones I've found useful. A more authoritative source of all of this information is probably the SmartOS wiki and the vmadm
manual page.
I tend to keep my manifests on file in the global zone when running SmartOS as a stand-alone containerizor, usually under /usbkey/vmcfg/
or /usbkey/manifests/
for easy re-creation of zones.
Native Zones
Native SmartOS zones are an Illumos based isolated environment that passes everything through to the global zone kernel with no translation at all, except for the isolation provided by virtue of being a zone.
A slightly modified example manifest from the SmartOS wiki:
{
"brand": "joyent",
"image_uuid": "1d05e788-5409-11eb-b12f-037bd7fee4ee",
"alias": "test-smartos",
"hostname": "test-smartos",
"cpu_cap": 200,
"max_physical_memory": 1024,
"quota": 20,
"delegate_dataset": true,
"resolvers": ["8.8.8.8", "208.67.220.220"],
"nics": [
{
"nic_tag": "admin",
"ips": ["dhcp"]
}
]
}
For ease of reading, these properties tend to be grouped into the following general arrangements. Properties marked with an asterisk (*) are required for each brand.
Administrative
brand
*: String that must be set tojoyent
orjoyent-minimal
for this zone type.image_uuid
: String representing the image UUID that this zone should be instanced from. Images are managed byimgadm
.alias
: String used for display/lookup purposes from outside the guest zone.hostname
: String used to configure the guest zone's hostname on creation.
CPU/Process
cpu_cap
: Integer representing the percentage of each CPU core available to this zone. A value of 300 represents up to 3 full CPU cores.cpu_shares
: Integer representing the number of fair share scheduler (FSS) shares for this zone. Only meaningful relative to other zones on the system and only applies when there is CPU contention between zones. A value of 25 will mean this zone will only have access to \(\frac{1}{4}\) as much CPU time as another zone with the default value of 20.max_lwps
: Integer representing the maximum number of threads a zone is allowed to run. The default value of 2000 should be pretty reasonable.
Memory
max_locked_memory
: Integer representing the number of MiB of memory this zone is allowed to lock. Locked memory are pages that are explicitly marked as non-swappable, and cannot exceed its default value ofmax_physical_memory
.max_physical_memory
: Integer representing the number of MiB of memory this zone is allowed to use. The default value is 256.max_swap
: Integer representing the number of MiB of virtual memory this zone is allowed to use. This value must be greater than its default value ofmax_physical_memory
or256
whichever is greater.
Storage
quota
: Integer representing the number of GiB that this zone ZFS dataset should have its quota set to.delegate_dataset
: Boolean that determines if a ZFS dataset will be delegated to this zone on creation. If set to true, this zone will get a dataset at<zoneroot dataset>/data
(default of:zones/<uuid>/data
.) This dataset can be configured many different ways to optimize for databases, snapshots, etc.indestructible_delegated
: Boolean that determines if the delegated ZFS dataset should have azfs hold
set on it to enable two-step deletion. Use this if you're really unsure about accidentally deleting your data.indestructible_zoneroot
: Same as above but for the entire guest zone.filesystems
: Array of JSON objects representing additional filesystems that would be outside of normal operation to be mounted within zones. Below are the required parameters:filesystem.*.type
: String representing the type of filesystem to be mounted,lofs
for a bind mount,pcfs
for a pc filesystem,tmpfs
etc.filesystem.*.source
: String representing the source directory from the scope of the global zone, primarily useful forlofs
mounts.filesystem.*.target
: String representing the mountpoint from the scope of the guest zone.filesystem.*.raw
: String representing a raw device to be associated with the source filesystem, most often, this should be a device file for a drive.filesystems.*.options
: Array of strings representing the mount options for this filesystem when it is mounted into the zone. Eg:["ro", "nodevices"]
fs_allowed
: String representing filesystem types this zone is allowed to mount. If you're building SmartOS, you will want this as:"ufs,pcfs,tmpfs"
tmpfs
: Integer representing the number of MiB this zone is allowed to use for itstmpfs
mounted at/tmp
. Cannot exceed its default value ofmax_physical_memory
.zfs_filesystem_limit
,zfs_snapshot_limit
: Integers representing the limits on the number of ZFS filesystems and snapshots a zone can have. Useful when combined withdelegate_dataset
to prevent runaway resource consumption.zfs_io_priority
: Integer representing the zone's IO priority when operating on a system with IO contention. Zones with values less than (or greater than) the default value of 100 will have their IO throttled (or prioritized) when both try to use all available storage IO.
Network
resolvers
: Array of strings representing DNS resolvers that will be assigned to/etc/resolv.conf
upon zone creation.maintain_resolvers
: Boolean that determines ifvmadm
should update guest zone resolvers when the above property is updated. Default:false
nics
: Array of JSON objects representing a guest zone's network interfaces. Below are the required parameters:nics.*.primary
: Boolean representing which vnic should be used for this zone's default gateway and nameserver values. Only useful with multiple nics.nics.*.nic_tag
: String representing which physical nic or etherstub that this vnic should be associated with.nics.*.vlan_id
: Integer representing what vlan tag should be used for this vnic.nics.*.interface
: String representing the interface name this zone will use for this interface. Always in the format ofnetX
whereX
is an integer \(\geq 0\). This parameter is primarily useful for configuring zones with multiple nics.nics.*.mac
: String representing the MAC address of a vnic. This is useful when interfacing with external systems expecting a specific MAC address.nics.*.ips
: Array of strings representing IPv4 CIDR or IPv6 CIDR addresses for a given vnic. The special strings"dhcp"
and"addrconf"
can be used as well to represent the use of DHCPv4 and SLAAC or DHCPv6, respectively.nics.*.gateways
: Array of strings representing IPv4 addresses that this zone should use as network gateways. If multiple gateways are specified, OS-specific behavior will apply (eg round robin on SmartOS). Not required if using DHCP.nics.*.routes
: JSON object that maps network destinations to gateways. Destinations (keys) can be either IP addresses or IP Subnetworks in CIDR notation. Gateways can be either IP addresses or in the form ofnics[0]
ormacs[aa:bb:cc:12:34:56]
.nics.*.allow_dhcp_spoofing
,nics.*.allow_ip_spoofing
: Booleans that determine if this zone vnic should be granted certain permissions. DHCP spoofing is required for DHCP servers. IP spoofing is required for routers.nics.*.allowed_ips
: Array of strings representing additional IP addresses from which this vnic is allowed to send traffic. This is useful for IP address failover schemes between multiple zones.nics.*.blocked_outgoing_ports
: Array of integers representing port numbers to which this vnic is prevented from sending traffic. Eg:[80, 443, 8080]
Additional Properties
limit_priv
: String representing the list of privileges that will be available to this zone. The default is normally fine, but some applications may require special permissions to run properly, for instance FreeSwitch apparently needs"default,proc_clock_highres,proc_priocntl"
to enable the use of high resolution timers with very small time values and for better control over its scheduling class, both probably important for low latency voice. Seeman 5 privileges
.customer_metadata
: JSON object representing metadata to be associated with this VM. This data can be accessed from within the guest zone by using themdata-get
command, even through this object, eg:
"customer_metadata": {
"root_authorized_keys": "ssh-ed25519 <key data>",
"user-script": "/usr/sbin/mdata-get root_authorized_keys > /root/.ssh/authorized_keys"
}
Linux Branded Guest Manifests
Linux Branded SmartOS zones are a Linux user-space with an additional translation layer that converts Linux ABI calls from the user-space into Illumos ABI calls before passing them on to the Illumos kernel, effectively allowing Linux user applications to operate under an Illumos kernel.
A slightly modified example manifest from the SmartOS wiki:
{
"brand": "lx",
"kernel_version": "4.2.0",
"image_uuid": "63d6e664-3f1f-11e8-aef6-a3120cf8dd9d",
"alias": "test-debian9",
"hostname": "test-debian9",
"cpu_cap": 400,
"max_physical_memory": 4096,
"quota": 1000,
"resolvers": ["192.168.180.1", "8.8.8.8"],
"nics": [
{
"nic_tag": "external",
"vlan_id": 180,
"ips": ["192.168.180.182/24"],
"gateways": ["192.168.180.1"]
}
]
}
The properties of Linux branded zones are almost identical to SmartOS zones, with the following differences:
Administrative
brand
*: String that must be set tolx
for this zone type.kernel_version
: String representing the version of Linux to report/emulate.
As of January 2021, not all ABI functionality of the latest Linux kernels is supported by the Linux translation layer, meaning that many modern distributions fail to function correctly. This is being worked on.
HVM Guest Manifests
Hardware Virtual Machine (HVM) Guest zones contain a hardware virtualization suite utilizing either KVM or Bhyve to emulate hardware for any operating system that can run as a guest.
A slightly modified example manifest from the SmartOS wiki:
{
"brand": "bhyve",
"alias": "test-debian10",
"hostname": "test-debian10",
"vcpus": 4,
"ram": 4096,
"disks": [
{
"image_uuid": "9bcfe5cc-007d-4f23-bc8a-7e7b4d0c537e",
"model": "virtio",
"boot": true
}
],
"resolvers": ["208.67.222.222", "8.8.4.4"],
"nics": [
{
"nic_tag": "admin",
"ips": ["10.33.33.33/24"],
"gateways": ["10.33.33.1"],
"model": "virtio",
"primary": true
}
]
}
While there's quite a bit of divergence between the properties of OS (joyent
and lx
branded zones) and HVM (kvm
and bhyve
branded zones), most of the OS properties actually still apply, only to the zone performing the virtualization, not to the guest.
Please also note that some of these properties are specific to bhyve
while others are specific to kvm
. I will try to illustrate which is which below:
Administrative
brand
: String representing which hardware virtualization suite to use for this VM. Must be eitherkvm
orbhyve
.bhyve_extra_opts
,qemu_extra_opts
: Strings representing additionalbhyve
andkvm
command-line parameters to be appended to the end of the commands. While this was intended for debugging, it's also generally useful.boot
: String representing the boot order forkvm
VMs. Expected format isorder=X*
where X is eitherc
for the hard drive,d
for the first CD-ROM drive, andn
for network boot. eg:order=cdn
would boot from the hard drive, CD-ROM drive, and network, in that order.bootrom
: String representing the bootrom to use underbhyve
. Values are eitherbios
,uefi
or a path to a custom bootrom binary relative to the guest zone root.
CPU/Process
vcpus
: Integer representing the number of virtual CPUs the guest will see. This property can be used withcpu_cap
andcpu_shares
to more closely control CPU utilization.
Memory
ram
: Integer representing the number of MiB of memory that will be made available to the guest kernel. This should be used in place ofmax_physical_memory
as it will need to allocate additional memory to handle the requirements ofbhyve
orqemu
.
Storage
disks
: Array of JSON objects representing disks that should be associated with this VM.disks.*.block_size
: Integer representing the block size of the disk. This property can only be set during disk creation, and cannot be set when cloning a disk.disks.*.boot
: Boolean representing if this disk should be bootable.disks.*.guest_block_size
: String representing the device block size reported to the guest. By default, the block size of the underlying device is reported to the guest. This setting will override the default value. It also supports reporting of both physical and logical block sizes using a string in the form of"logical size/physical size"
, eg:"512/4096"
to look like a 512e drive. Values must always be powers of 2.disks.*.image_uuid
: String representing the dataset from which to clone this disk. These images are managed byimgadm
.disks.*.refreservation
: Integer representing the size of this refreservation in MiB.disks.*.size
: Integer representing the size of this disk in MiB. This property is mutually exclusive fromimage_uuid
, and is useful for creating empty disks.disks.*.media
: String representing whether this disk is a"disk"
or a"cdrom"
.disks.*.model
: String representing the driver that should be used by the guest to access this disk. Should be one of"virtio"
,"ide"
or"scsi"
.disk_driver
: String representing the default values fordisks.*.model
above.flexible_disk_size
: Integer representing the number of MiB of storage space that abhyve
instance may use for its disks and snapshots of those disks. This value should be larger than \(\sum_{d}\).
Network
nics.*.allow_unfiltered_promisc
: Boolean representing if this guest should be able to utilize multiple MAC addresses, eg: running SmartOS with vnics. Really only suitable for testing containerizors from within a VM.nics.*.model
: String representing the driver that should be used by the guest to access this vnic. Should be one of"virtio"
,"e1000"
or"rtl8139"
.nic_driver
: String representing the default values fornics.*.model
above.
Additional Properties
vnc_port
: Integer representing the TCP port that the VNC server attached to this VM will listen on. 0 (default) will choose a port at random, -1 will disable VNC server.vnc_password
: String representing the password which will be required when authenticating to the VNC server. This password will be visible from the global zone, and is limited to a maximum of 8 characters.