What started years ago as a wild goose chase for a scalable first-class hyper-converged infrastructure ultimately led to the fantastic discovery of Joyent SmartOS.

While I've barely looked back since, the world has changed significantly since then, especially with the exodus of multiple high profile Joyent engineers from that company.  I occasionally wonder how my conclusions would have changed had I been searching in today's technological climate.

In this article, we'll explore the strengths and weaknesses of SmartOS in the context of today's open-source hyper-convergence space.

SmartOS (Illumos)

Joyent SmartOS is an Illumos based open-source hypervisor/containerizor that integrates Crossbow, DTrace, KVM, Bhyve, ZFS and Zones into a light weight in-memory solution which can boot from either a local USB drive, a boot dataset embedded into the primary storage pool via piadm, or over PXE.

Pros:

  • Lightweight ephemeral in-memory global zone that is relatively immutable, improving security and enabling easy upgrades by simply re-deploying a boot image and rebooting.
  • Supports both containers (zones) for maximum performance and HVM (KVM or Bhyve) based guests for maximum flexibility.
  • Strong default isolation between guests without any additional configuration.  HVMs are run from within a zone, providing an additional layer of security between the guest and the host.
  • Zones can be Linux (lx) branded, which allows for Linux user-spaces to exist natively on Illumos.  Debian, Ubuntu, and CentOS zone images are included by default.
  • A ZFS dataset can be delegated to a zone, allowing the zone to define and configure their own children datasets.
  • DTrace is accessible from both inside and outsize of zones, allowing for incredibly detailed instrumentation of production deployments.
  • Crossbow network virtualization allows for complex virtual networks to be configured between guests of a single hypervisor, or bridged between multiple hypervisors.
  • Guests can be tightly constrained by to follow very specific CPU, memory, file system and network restrictions.
  • Scalable, up to thousands, or down to a single host.
  • Support for Docker, Kubernetes and Object Store (Manta) through Triton Data Center.
  • Rapid Updates.  Joyent usually has new releases of SmartOS available every two weeks.
  • Strict local node storage architecture ensures low file system latency and node independence.
  • Uses NetBSD's pkgsrc for package management.
  • Optional management layers such as Triton DataCenter and Project Fifo allow for large clusters of hosts to be easily managed, as well as providing additional features.

While none of these pros are completely unique to SmartOS, I have yet to find a project that incorporates all of them so well, even today.  I'm probably biased.

Cons:

  • DTrace doesn't work across the HVM boundary.
  • Strict local node storage architecture means migration between compute nodes requires a ZFS send/recv to push guest data from the source node to the destination node, making instant migrations between hypervisors impossible.
  • SmartOS limits crossbow configurations to specific conventions.
  • Does not support as wide an array of hardware or release new drivers as quickly as Linux does.
  • LX branded zones do not support the latest Linux kernel interfaces, making them ill suited for the latest versions of leading Linux distributions.
  • Illumos has many fewer active developers than Linux does.
  • Illumos ZFS rather than OpenZFS.

While I don't give the first three points on this list of cons much attention, the later ones have become a much bigger deal-breaker in the past few years.  Linux has long been the focal point of performance improvements and technological innovation in this space, and any technological capital that may have been built up under Sun has almost certainly been eclipsed by Linux at this time.

Maybe.

There are certainly important performance metrics in which Linux surpasses Illumos.  There are also technologies in the Linux ecosystem that completely fail to solve their intended problems.  The best example is probably epoll: that took over a decade for Linux to "get right", despite prime working examples from multiple other predating operating system implementations.

Bryan Cantrill puts it quite succinctly.

And that's fine.

While the number of developers contributing to a project doesn't necessarily have a causal effect on the quality of that project, it's still quite concerning to observe the continuing ablation of Illumos engineering and outreach talent from Illumos is very concerning for the future of the operating system.

Probably my most significant concern is the combination of the above point and what appears to be the divergence of Illumos from OpenZFS.  Illumos had enjoyed being the implementation of reference for years, but that changed early last year with what was likely a self-inflicted wound on Illumos' part and the ZFS on Linux repo being renamed to openzfs/ZFS, that is no longer the case.  Illumos ZFS will either need to maintain feature parity and ideally binary compatibility with OpenZFS, or port OpenZFS into Illumos.

The bad news: both of these prospects require Illumos kernel developer time, which is at a premium right now.  The good news: this has clearly registered as a priority for the Illumos developers, and work appears to have been made towards porting OpenZFS into Illumos.

Lastly: I'm not sure what exactly Samsung's intentions with Joyent are.  Since purchasing Joyent in 2016, there has been a marked change in the way that Joyent does business, beginning with sweeping changes to the way it communicates to the public about its product offerings, and probably culminating in the 2019 closure of the Joyent Public Cloud.  Besides the lack of any recent innovations and exodus of top-tier talent, it generally feels like Joyent is just treading water, and that's not a good position to be in for the long run.

Alternatives

As of January 2021, there are numerous projects that overlap quite heavily with SmartOS.  Lets briefly review some of them.

TrueNAS Core (FreeBSD)

iXsystems' TrueNAS Core is a FreeBSD based open-source Network Addressable Storage (NAS) Operating System that provides data accessibility through SMB, AFP, NFS, iSCSI, SSH, rsync, and FTP/TFTP, all managed through a nice shiny web interface.  While it's main use case is as a NAS, it can also locally run FreeBSD Jails and BHyve, giving it hypervisor and containerizor functionality.

TrueNAS includes DTrace and OpenZFS, both of which are well supported on FreeBSD based operating systems.  The installation process, while quite straight forward, does rely on installation to separate boot media which, unlike an ephemeral in-memory image, is continually being written to during normal operation.  This will wear out SD cards and USB flash drives given enough time, meaning that your best bet will be to install directly to a hard drive or solid state drive.  In most of my configurations, such drive space comes at a premium since I'm usually using that space for my primary storage pool instead.  This also makes it a bit less convenient to upgrade.

While FreeBSD based operating systems should have access to Open vSwitch, it is unclear how accessible this feature is through TrueNAS.  Meaning that custom network configurations may not be easily established without additional work.

In the past, iXsystems had been rather inconsistent in their release schedule and some of their updates have been full of regressions (lookin' at you FreeNAS 10).  This appears to have been ironed out with completely after the TrueNAS re-branding.

TrueNAS Core is definitely worth looking back into, as they have made major leaps and bounds on their platform since I last directly used FreeNAS.

Docker Engine (Linux)

Docker is the world's most adopted containerizor solution, and can be directly installed into a pre-existing Linux installation.  Docker is based on Linux Containers (LXC) and utilizes Linux cgroups to isolate processes from each other and create virtual environments, similarly to FreeBSD's Jails and Illumos' Zones.

Having worked with the precursor to Linux Containers, I probably would have been at home with the feature set of LXC and the convenience and consistency of Docker.  However, there were no solidly integrated light-weight docker-based solutions that integrated ZFS at the time I was searching for a solution to my problem at the time, which is how I ended up switching to SmartOS.

There are a few well-packaged and delivered solutions now though.

Proxmox VE (Linux)

If Proxmox Virtual Environment existed then as it does now, I would probably never have looked any further.  It's a hypervisor/containerizor with a pretty web-based interface that has buttons and graphs, making it visually appealing and generally easy to use.  It ships with OpenZFS, actually supports PCIe hardware passthrough to virtual machines, has what appears to be solid Open vSwitch integration and can scale up with it's clustering support.  It looks to be generally adaptable to various contortions that I'd be sure to put it into.  It's basically perfect.

Almost perfect.

Like TrueNAS, Proxmox VE needs to be installed onto a physical drive, both for the same reasons and with the same caveats.

While the UX is very nice, it is easier than it would seem to misconfigure things at times.  I know that's ridiculous coming from someone who primarily works on CLIs.  Dockers can be run from the Proxmox VE host, but due to the differences between "application containers" (docker) and "system containers" (pct), that practice is discouraged in the official documentation.  Yes, they're both LXC based, but they're different above that layer, and apparently incompatible to manage.

As with TrueNAS, Proxmox VE is definitely something that's worth looking into.

OpenStack, OpenQRM, OpenNebula, oVirt (Linux)

I don't know why but I'm just not interested in any of these projects.  They present as generally less suitable than the other options already listed above, usually due to appearing too large and inflexible.  They're just not exciting.

I may end up exploring some of these projects in the future, but there are currently no foreseeable plans to do so.

Oxide

Technology is always moving, and while they have yet to release any products, the cloud technology company that Bryan Cantrill and quite a few other Joyent talent is at the center of deserves an honorable mention here.

If Cantrill's public speaking event around the time of his leaving of Joyent is any indication, Oxide will be an attempt at building a full cloud-scale operating system using the Rust programming language and all of the experience and expertise that team embodies.  It will definitely be worth keeping an eye out for any significant announcements.

Conclusion

While SmartOS has been a good fit over the last decade, there are looming uncertainties on the horizon which lead me to question if that will still be the case come 2030.

Fortunately, there are also a lot of options moving forward.

If all goes as planned, these options will be benchmarked against SmartOS and each other on bare metal running both simple and complex workloads.  Keep an eye out for that hopefully sooner rather than later.