This is a continuation of the previous article on hardware updates to my homelab, focusing on the configuration of the major hardware components discussed.
No need for much of a preamble here, lets get into it.
There are a bunch of advanced features to experiment with on this Cisco switch, but for now just getting it up and running should be good enough.
I factory reset it, enabled setup mode and hopped onto the console. After running through the initial setup program and assigning some static IP addresses and passwords, I confirmed I could connect with the new addresses and proceeded with the following configuration steps:
EtherChannel (aka Link Aggregation)
The switch and hypervisor are physically connected via a pair of SFP+ DAC cables. It would be criminal not to configure them for balanced failover. Fortunately, Cisco's EtherChannel and Illumos' Link Aggregation talk to each other, so we'll be setting that up.
I set the following configuration on the switch:
#configure terminal interface Te1/0/1 switchport mode trunk channel-group 1 mode active interface Te1/0/2 switchport mode trunk channel-group 1 mode active end
Since I had already configured link-aggregation on the SmartOS side, I also confirmed this was functional on the switch side:
#show etherchannel 1 summary Flags: D - down P - bundled in port-channel I - stand-alone s - suspended H - Hot-standby (LACP only) R - Layer3 S - Layer2 U - in use f - failed to allocate aggregator M - not in use, minimum links not met u - unsuitable for bundling w - waiting to be aggregated d - default port Number of channel-groups in use: 1 Number of aggregators: 1 Group Port-channel Protocol Ports ------+-------------+-----------+-------------------------- 1 Po1(SU) LACP Te1/0/1(P) Te1/0/2(P)
Port-channel 1 is currently in use and both 10GBE ports are bundled in it.
Virtual LAN segments
As this switch also supports VLANs, this enables a bunch of my pre-existing hardware, such as IP phones, Wireless Access Points, and SmartOS itself. In roughly sketching some ideas out, I came to the following layout:
- Infrastructure network (
vlan id 1, IPv4/27, IPv6/64) dedicated virtual network for network or infrastructure management interfaces: switches, access points, IP phones, iDRACs, hypervisors.
- Embedded network (
vlan id 2, IPv4/27, IPv6/64) dedicated virtual network for embedded devices: chromecasts, IoT devices.
- Internal network (
internal etherstub, IPv4/27, IPv6/64) dedicated etherstub for internally facing VMs and zones.
- External network (
external etherstub, IPv4/27, IPv6/64) dedicated etherstub for externally facing VMs and zones.
- Private network (
vlan id 3, IPv4/27, IPv6/64) dedicated virtual network for physically secured devices, workstations, laptops, etc.
- Guest network (
vlan id 4, IPv4/27, IPv6/64) dedicated virtual network for guest wireless devices.
- Public network (
vlan id 5, IPv4/DHCP) dedicated virtual network for upstream network connectivity.
With this in mind, I set the following configuration on the switch:
#configure terminal vlan 2 name embedded vlan 3 name private vlan 4 name guest vlan 5 name public end
And then I confirmed with the following command:
#show vlan VLAN Name Status Ports ---- -------------------------------- --------- ------------------------------- 1 default active Gi1/0/1, Gi1/0/2, Gi1/0/3 Gi1/0/4, Gi1/0/5, Gi1/0/6 Gi1/0/7, Gi1/0/8, Gi1/0/9 Gi1/0/10, Gi1/0/11, Gi1/0/12 Gi1/0/13, Gi1/0/14, Gi1/0/15 Gi1/0/16, Gi1/0/17, Gi1/0/18 Gi1/0/19, Gi1/0/20, Gi1/0/21 Gi1/0/22, Gi1/0/23, Gi1/0/24 2 embedded active 3 private active 4 guest active 5 public active 1002 fddi-default act/unsup 1003 token-ring-default act/unsup 1004 fddinet-default act/unsup 1005 trnet-default act/unsup VLAN Type SAID MTU Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2 ---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ------ 1 enet 100001 1500 - - - - - 0 0 2 enet 100002 1500 - - - - - 0 0 3 enet 100003 1500 - - - - - 0 0 4 enet 100004 1500 - - - - - 0 0 5 enet 100005 1500 - - - - - 0 0 1002 fddi 101002 1500 - - - - - 0 0 1003 tr 101003 1500 - - - - - 0 0 1004 fdnet 101004 1500 - - - ieee - 0 0 1005 trnet 101005 1500 - - - ibm - 0 0 Remote SPAN VLANs ------------------------------------------------------------------------------ Primary Secondary Type Ports ------- --------- ----------------- ------------------------------------------
Saving the Running Configuration
After ensuring that everything works as intended, the startup switch configuration needs to be overwritten by the current one. This is done with the following:
#copy running-config startup config
While Cisco IOS has quite the learning curve over what I'm used to in switch configuration, it wasn't at all unpleasant to work with once I slowed down and took the time required to understand it.
While the above configuration steps were the bare minimum to get this network up and running, there's a bunch more stuff of interest in that Cisco switch that I would like to dig into in the future.
But as usual, we'll save that for another day.
SmartOS Network Configuration
As I wanted my administrative interface to function over an link aggregation, I set the following in
# Aggregation from Intel 2x10GBE Interfaces (e2,e3) aggr0_aggr=00:00:00:00:00:00,00:00:00:00:00:00 aggr0_lacp_mode=active # Administrative Interface admin_nic=aggr0 admin_ip=172.22.1.8 admin_netmask=255.255.255.224 admin_network=172.22.1.0 admin_gateway=172.22.1.1 # Additional Etherstubs etherstub=external0,internal0 # Common Configuration hostname=gz-1 dns_domain=ewellnet dns_resolvers=172.22.1.1 ntp_conf_file=ntp.conf root_authorized_keys_file=authorized_keys
Some brief highlights:
aggr0_aggrrefers to the interfaces to use in the link aggregation by hardware address.
aggr0_lacp_modeensures this side is actively participating in LACP, instead of passively waiting for another active party.
admin_nicsets the administrative interface, in this case, to the link aggregation. The rest of the
admin_parameters are as set by the SmartOS installation.
etherstubsets additional etherstubs to be configured upon boot.
- I'm setting my own custom
ntp.confso that I can use my global zone as a network time server across multiple subnets.
driftfile /var/ntp/ntp.drift logfile /var/log/ntp.log # Ignore all network traffic by default restrict default ignore restrict -6 default ignore # Allow localhost to manage ntpd restrict 127.0.0.1 restrict -6 ::1 # Allow servers to reply to our queries restrict source nomodify noquery notrap # Allow local subnets to query this server restrict 172.22.1.0 mask 255.255.252.0 nomodify # Time Servers pool 0.smartos.pool.ntp.org burst iburst minpoll 4 pool 1.smartos.pool.ntp.org burst iburst minpoll 4 pool 2.smartos.pool.ntp.org burst iburst minpoll 4 pool 3.smartos.pool.ntp.org burst iburst minpoll 4
SmartOS Zpool Configuration
After getting a hardware configuration together that worked for Illumos, I spent a few weeks testing various vdev configurations for performance and spatial efficiency. As the tests evolved over that time, I wasn't fully satisfied with the consistency of the methodology and will be rerunning those tests again for my own information as well as to feature in a future article. There were some pretty solid results that shone through through.
- ZFS pools based on three five-drive RAIDZ vdevs significantly outperformed pools based on two eight-drive RAIDZ2 vdevs in terms of sequential read performance (17.9% faster) and storage efficiency (7%).
- ZFS pools with special allocation class vdevs outperformed pools without them in terms of sequential read performance (18.3% faster).
The performance advantages of RAIDZ outweigh the resiliency advantages of RAIDZ2 in my case, as this pool configuration also has a hot-spare, reducing temporal exposure to loss of the pool. As well, critical datasets are regularly replicated off-site.
The zones pool of the new server was manually created during SmartOS installation with the following command:
# zpool create \ -o autotrim=on -O atime=off \ -O checksum=edonr -O compression=lz4 \ -O recordsize=1M -O special_small_blocks=128K \ zones \ raidz c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 \ raidz c1t5d0 c1t6d0 c1t7d0 c1t8d0 c1t9d0 \ raidz c1t10d0 c1t11d0 c1t12d0 c1t13d0 c1t14d0 \ spare c1t15t0 \ special mirror c3t1d0 c4t1d0 mirror c5t1d0 c6t1d0
Parameters of note:
autotrim=onenables automatic trim for all trim-capable devices in the pool. In this case, all NVMe SSD based special vdev leaf devices.
atime=offdisables file access time updates, reducing metadata writes and improving throughput. This is a standard parameter used by SmartOS zones pools.
checksum=edonruses the edonr checksum instead of fletcher for filesystem checksum calculations. I had found during previous testing that out of all of the cryptographicly strong checksum algorithms available to ZFS, edonr performed the best. This should be re-verified.
compression=lz4explicitly uses lz4 for block compression, which is almost universally a good idea. This could also be set to
compression=onand will use the ZFS default compression algorithm, which will attempt to balance compression speed with compression ratio.
recordsize=1Mraises the maximum record size from 128K to 1M. This improves performance for large sequential file access by keeping RAIDZ stripes on individual leaf devices relatively large (36K-256K) and ensures that a range of file sizes fit on the normal vdevs instead of the special vdevs, thanks to the next parameter.
special_small_blocks=128Kallows for data blocks up to 128K to also be stored on the NVMe SSDs instead of the hard drives. This should drastically improve random IO and overall throughput.
The combination of the last two parameters effectively creates a hybrid storage pool. All metadata and blocks of up to 128K go to one class of storage while blocks between 128K and 1M. By adjusting
special_small_blocks different storage properties can be achieved for different datasets.
Cache only metadata for swap zvol
I've never liked the idea of swap pages being cached in the ARC, and that happens by default in SmartOS. Fortunately it's easy to switch that behavior on and off at anytime:
# zfs set primarycache=metadata zones/swap
The above command will ensure that only metadata for zones/swap makes its way into the ARC, preserving it for normal file access. I can't really foresee a case where I would want to reverse this, perhaps with L2ARC devices installed in this pool? Either way it's rather trivial to revert back to the normal behavior with:
# zfs inherit primarycache zones/swap
Calming the Dragon (fans)
It was a bit of a surprise when I started this server up the first time after adding a non-certified-by-dell PCIe card. If that sounds like a bit of a shakedown, it is. It also sounded like the building was about to take off. I, as many before me had, discovered that Dell is very conservative when it comes to cooling devices that their firmware can't monitor the temperatures of. And by conservative, I mean liberal with the cooling.
Some people solve this problem by completely disabling the automatic thermal profiles and manually stepping up and down the fan speed via
ipmitool and some cron scripts that run every minute.
And struck me as a horrible idea.
It would be so much better to continue to let the dedicated firmware that monitors system temperature to track component temperatures and adjust airflow to correct for it. Just if there was only a way to tell it not to worry about those PCIe devices behind the curtain.
Fortunately, at least someone at Dell agrees with me.
It turns out you can disable the third-party PCIe cooling response, preventing it from loudly complaining about additional PCIe devices in the system. The only utility required to change this behavior is
ipmitool which is already part of the SmartOS global zone.
To check the current cooling response status, run the following command:
# ipmitool raw 0x30 0xCE 0x01 0x16 0x05 0x00 0x00 0x00
The following response means the third-party cooling response is disabled. Quiet.
16 05 00 00 00 05 00 01 00 00
The following response means the third-party cooling response is enabled. Loud.
16 05 00 00 00 05 00 00 00 00
To disable the third-party cooling response, run the following command:
# ipmitool raw 0x30 0xCE 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x01 0x00 0x00
To enable the third-party cooling response, run the following command:
# ipmitool raw 0x30 0xCE 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x00 0x00 0x00
This setting appears to be maintained across power cycles, but if you do find yourself in a situation where you need to run cards that run hot (like NV1604s), it would be wise to re-enable the default behavior to avoid that potential fire hazard.