Package Caching on SmartOS

Package Caching on SmartOS

If you use package management software commonly across two or more systems, network package caching may be of benefit to you.  In this article, we will setup a package caching solution to service multiple package management systems simultaneously.

Methodology

While caching packages locally in the file system is already common practice with most package management systems, it's difficult to share this local cache with other nearby systems without possible security implications and things getting prohibitively complicated.

Instead, redirecting requests to a local-network intermediary that provides the same interface as the original repository while also managing its own cache is the preferred solution: Most package management systems use HTTP or HTTPS and refer to their repositories via domain name or URL, and many can be reconfigured to access their repositories through alternate means. In cases where this is not trivial, split-horizon DNS records can be used instead.

This technique is used extensively by projects such as Lancache for game management systems but can also easily be applied to other software package management systems such as Pkgsrc, APT, Yum, etc.

Architecture

There will be two independent services involved in implementing this solution: a DNS resolver and a caching HTTP server.

While I will be including notes on the hostnames that need to be configured, the specifics of implementing DNS will be outside the scope of this article. I'm assuming that you have access to your own DNS resolver such as dnsmasq or powerdns-recursor and know how to configure it for split-horizon DNS.

While I used to prefer redirecting the original domain names, I now change the tld to cache.ewellnet instead, for example:

  • pkgsrc.joyent.com becomes pkgsrc.joyent.cache.ewellnet.
  • archive.ubuntu.com becomes archive.ubuntu.cache.ewellnet.

While this does require the reconfiguration of each host using the cache, overall it's a cleaner approach and allows hosts to easily bypass the cache if need be. Additionally, my search directive under /etc/resolv.conf is set to ewellnet, so I can use the relatively shorter pkgsrc.joyent.cache or archive.ubuntu.cache instead.

For our caching HTTP server, we will be using Nginx within a single zone. It's incredibly high performance, easy to configure, and does its job very efficiently. This should be perfectly performant for most situations, but may be unsuitable in extreme circumstances. This can be remedied by utilizing a cache cluster, which is well outside the scope of today's article.

I will be using a common cache optimized configuration with additional configuration files for each supported package management system. Additionally for this article, I will setup a common package cache for all management systems, but the reader can just as easily partition their caches for sets of management systems.

It is also best that the cache zone be dual-stack. This benefits any IPv6 only hosts on your network that may otherwise not have access to some repositories.

Zone Manifest

We want to ensure that the zone containing the cache will have enough processing power and memory to not be starved, and enough storage space to handle all of the packages you would like to cache.

We will be using the following for our example:

{
  "image_uuid": "1d05e788-5409-11eb-b12f-037bd7fee4ee",
  "brand": "joyent",
  "alias": "cache",
  "hostname": "cache",
  "dns_domain": "ewellnet",
  "cpu_cap": 200,
  "max_physical_memory": 256,
  "quota": 1024,
  "delegate_dataset": true,
  "resolvers": [ "172.22.1.97" ],
  "nics": [{
    "nic_tag": "external0",
    "ips": [ "172.22.1.98/27", "addrconf", "2001:470::98/64" ],
    "gateways": [ "172.22.1.97" ],
    "primary": true
  }]
}

Create it and login.

# vmadm create -f cache.json
Successfully created VM 49ee67da-9e3c-c20a-d925-aa2aa284f95d
# zlogin 49ee67da-9e3c-c20a-d925-aa2aa284f95d

After doing any zone cleanup that you prefer, ensure that a dataset exists to handle your cache. I will often times set a quota to ensure a hard upper limit to size slightly beyond the limit that will be set later in Nginx.

# zfs create -o quota=982G -o mountpoint=/var/www/cache zones/<UUID>/data/cache

Installing & Configuring Nginx

Next, we're going to install Nginx.

# pkgin -y install nginx

Clear out all of the Nginx configuration files; we'll be doing something very specific.

# rm -rv /opt/local/etc/nginx/*

Instead of the default nginx.conf, we'll be using this one, optimized for caching:

/opt/local/etc/nginx/nginx.conf:

user www www;
worker_processes 2;

events { worker_connections 1024; multi_accept on; }

http {
  default_type  application/octet-stream;

  allow 172.22.1.0/24;
  allow 2001:470::/48;
  deny  all;

  server_tokens off;
  tcp_nopush  on;
  sendfile  on;
  gzip    on;

  proxy_buffering          on;
  proxy_buffers            32 8k;
  proxy_cache              default_cache;
  proxy_cache_lock         on;
  proxy_cache_lock_age     5m;
  proxy_cache_lock_timeout 30s;
  proxy_cache_path         /var/www/cache levels=2:2 use_temp_path=on keys_zone=default_cache:128m inactive=4y max_size=980G;
  proxy_cache_revalidate   on;
  proxy_cache_use_stale    error timeout invalid_header updating http_500 http_502 http_503 http_504 http_403 http_404 http_429;
  proxy_cache_valid        1d;
  proxy_cache_valid        any 1m;
  proxy_http_version       1.1;
  proxy_ignore_headers     X-Accel-Redirect X-Accel-Expires X-Accel-Limit-Rate X-Accel-Buffering X-Accel-Charset Expires Cache-Control Vary;
  proxy_temp_path          /var/www/cache/tmp 1;

  proxy_ssl_ciphers EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH;
  proxy_ssl_protocols TLSv1.2;
  proxy_ssl_server_name on;
  proxy_ssl_session_reuse on;
  proxy_ssl_verify_depth  4;

  log_format cache '$remote_addr - $remote_user [$time_local] $status $upstream_cache_status $server_name $request_time [$connection:$connection_requests] "$request_method $scheme://$http_host$request_uri $server_protocol" $body_bytes_sent "$http_referer" "$http_user_agent"';
  access_log  /var/log/nginx/cache.log cache;

  server {
    listen [::] default_server ipv6only=off;
    server_name default;
    return 421;
  }

  include cache/*.cache;
}

A quick breakdown of configuration parameters that you will want to tweak:

  • worker_processes will limit Nginx to spawning that number of worker processes, this is important on compute nodes with high core counts, and should match your cpu_cap/100. In my case, with a cpu_cap of 200, this value should be 2.
  • The allow and deny directives will allow you to whitelist or blacklist IP prefixes. Use allow to whitelist your local network prefixes and then deny all to prevent all others from accessing your cache. In my case, my local network prefixes are 172.22.1.0/24 and 2001:470::/48.

At this point you can enable Nginx to confirm it's happy with its current configuration before adding specific management systems to the cache from the below sections:

# svcadm enable nginx

If you'd like to take a moment here, I recommend reading the Nginx documentation on proxy directives for a more complete understanding of what all has been specified here.

Also, if you want to make full use of all of the below sub-sections of this article, I recommend recompiling Nginx from a build host with sub and cache-purge enabled.

Package Caching

This section will start by demonstrating how to configure pkgsrc package caching and then illustrate the performance differences of caching vs not caching. It will then list additional configured package caching configurations for other operating systems and environments.

Each package management description will include any local changes that need to be made to the system being configured for caching, the DNS records that need to be configured along with the relevant Nginx configuration files. Please replace any references to ewellnet with your site's local domain name, or just omit that domain name entirely, your choice.

I also recommend keeping a terminal open to tail /var/log/nginx/cache.log while configuring this, to monitor for cache misses, hits and revalidations.

Pkgsrc (SmartOS)

The pkgsrc binary package manager is responsible for managing software packages in SmartOS zones. Like most distribution software package managers, it downloads, validates and installs in that order.

Lets get an idea of what sort of performance we can expect out of non-cached pkgsrc. Testing pkgin update in a freshly provisioned zone gives us the following results:

# time pkgin update
reading local summary...
processing local summary...
processing remote summary (https://pkgsrc.joyent.com/packages/SmartOS/2020Q4/x86_64/All)...
pkg_summary.xz                                   100% 2374KB 791.4KB/s   00:03

real    0m6.941s
user    0m3.294s
sys     0m0.318s

That's not bad. Lets see how long it'll take to download a full upgrade:

# pkgin clean
# time pkgin -dy upgrade
calculating dependencies...done.

17 packages to download:
  wget-1.20.3nb10 sudo-1.9.6p1 rsyslog-8.38.0nb10 python38-3.8.6nb1
  postfix-3.5.10 pkgin-20.12.1 pkg_install-20201218 openssl-1.1.1l
  libssh2-1.9.0nb1 libarchive-3.4.3 mozilla-rootcerts-1.0.20201102
  openldap-client-2.4.56 http-parser-2.9.4 npm-6.14.11 nodejs-14.16.1
  nghttp2-1.42.0nb1 curl-7.75.0
74M to download

wget-1.20.3nb10.tgz                     100% 1244KB   1.2MB/s   00:00
sudo-1.9.6p1.tgz                        100% 1866KB   1.8MB/s   00:01
rsyslog-8.38.0nb10.tgz                  100% 1165KB   1.1MB/s   00:01
python38-3.8.6nb1.tgz                   100%   27MB   3.9MB/s   00:07
postfix-3.5.10.tgz                      100% 2175KB   1.1MB/s   00:02
pkgin-20.12.1.tgz                       100%   98KB  98.4KB/s   00:01
pkg_install-20201218.tgz                100% 9201KB   3.0MB/s   00:03
openssl-1.1.1l.tgz                      100% 6488KB   2.1MB/s   00:03
libssh2-1.9.0nb1.tgz                    100%  389KB 389.0KB/s   00:01
libarchive-3.4.3.tgz                    100%  979KB 979.3KB/s   00:01
mozilla-rootcerts-1.0.20201102.tgz      100%  573KB 573.1KB/s   00:00
openldap-client-2.4.56.tgz              100% 1438KB   1.4MB/s   00:00
http-parser-2.9.4.tgz                   100%   48KB  47.8KB/s   00:00
npm-6.14.11.tgz                         100% 5352KB   2.6MB/s   00:02
nodejs-14.16.1.tgz                      100%   14MB   3.6MB/s   00:04
nghttp2-1.42.0nb1.tgz                   100%  296KB 295.9KB/s   00:01
curl-7.75.0.tgz                         100% 1664KB   1.6MB/s   00:01

real    0m40.083s
user    0m0.630s
sys     0m0.306s

40 seconds. Not horrible, but surely we can improve upon this.

DNS Records

Ensure that the following DNS records are configured:

  • pkgsrc.joyent.cache.ewellnet points to your cache server.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/joyent.cache:

upstream pkgsrc.joyent.com {
  server pkgsrc.joyent.com:443;
  keepalive 2;
}

server {
  listen [::];
  server_name pkgsrc.joyent.cache pkgsrc.joyent.cache.ewellnet;

  location ^~ pkg_summary.(bz2|gz|xz)$ {
    proxy_pass https://pkgsrc.joyent.com;
    proxy_cache_valid any 1h;
  }

  location / {
    proxy_pass https://pkgsrc.joyent.com;
  }
}

This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summary will be revalidated every hour.

Refresh Nginx to enable pkgsrc package caching:

# svcadm refresh nginx

Client Configuration

Pkgsrc clients need to be configured to make use of the local cache. This can be done by altering the repository URL within the configuration of each client:

/opt/local/etc/pkgin/repositories.conf:

...
http://pkgsrc.joyent.cache/packages/SmartOS/2020Q4/x86_64/All

Run pkgin update to confirm that it's able to acquire the repository package summary:

# time pkgin update
cleaning database from https://pkgsrc.joyent.com/packages/SmartOS/2020Q4/x86_64/All entries...
reading local summary...
processing local summary...
processing remote summary (http://pkgsrc.joyent.cache/packages/SmartOS/2020Q4/x86_64/All)...
pkg_summary.xz                          100% 2374KB 593.6KB/s   00:04

real    0m7.952s
user    0m4.332s
sys     0m0.581s

Note that the time is slightly longer, this is due to the clearing out of the previous database and due to the fact that we didn't have the summary cached.  We can also now re-test downloading an upgrade from the cache:

# pkgin clean
# time pkgin -dy upgrade
calculating dependencies...done.

17 packages to download:
  wget-1.20.3nb10 sudo-1.9.6p1 rsyslog-8.38.0nb10 python38-3.8.6nb1
  postfix-3.5.10 pkgin-20.12.1 pkg_install-20201218 openssl-1.1.1l
  libssh2-1.9.0nb1 libarchive-3.4.3 mozilla-rootcerts-1.0.20201102
  openldap-client-2.4.56 http-parser-2.9.4 npm-6.14.11 nodejs-14.16.1
  nghttp2-1.42.0nb1 curl-7.75.0
74M to download

wget-1.20.3nb10.tgz                     100% 1244KB   1.2MB/s   00:01
sudo-1.9.6p1.tgz                        100% 1866KB 933.2KB/s   00:02
rsyslog-8.38.0nb10.tgz                  100% 1165KB   1.1MB/s   00:01
python38-3.8.6nb1.tgz                   100%   27MB   3.9MB/s   00:07
postfix-3.5.10.tgz                      100% 2175KB   2.1MB/s   00:01
pkgin-20.12.1.tgz                       100%   98KB  98.4KB/s   00:00
pkg_install-20201218.tgz                100% 9201KB   3.0MB/s   00:03
openssl-1.1.1l.tgz                      100% 6488KB   2.1MB/s   00:03
libssh2-1.9.0nb1.tgz                    100%  389KB 389.0KB/s   00:00
libarchive-3.4.3.tgz                    100%  979KB 979.3KB/s   00:01
mozilla-rootcerts-1.0.20201102.tgz      100%  573KB 573.1KB/s   00:01
openldap-client-2.4.56.tgz              100% 1438KB   1.4MB/s   00:01
http-parser-2.9.4.tgz                   100%   48KB  47.8KB/s   00:00
npm-6.14.11.tgz                         100% 5352KB   1.7MB/s   00:03
nodejs-14.16.1.tgz                      100%   14MB   4.7MB/s   00:03
nghttp2-1.42.0nb1.tgz                   100%  296KB 295.9KB/s   00:01
curl-7.75.0.tgz                         100% 1664KB   1.6MB/s   00:01

real    0m32.785s
user    0m0.553s
sys     0m0.370s

Not too much quicker, but again, none of this data had been cached yet. Switch the source repository back, re-update, and switch back to the cache again to test it's real performance:

-- Switched back to upstream
# pkgin update
-- Switched back to cache
# time pkgin update
cleaning database from https://pkgsrc.joyent.com/packages/SmartOS/2020Q4/x86_64/All entries...
reading local summary...
processing local summary...
processing remote summary (http://pkgsrc.joyent.cache/packages/SmartOS/2020Q4/x86_64/All)...
pkg_summary.xz                          100% 2374KB 791.4KB/s   00:03

real    0m6.833s
user    0m3.952s
sys     0m0.555s

Not a whole lot to be excited about with pkgsrc update. Lets check on upgrade:

# pkgin clean
# time pkgin -dy upgrade
calculating dependencies...done.

17 packages to download:
  wget-1.20.3nb10 sudo-1.9.6p1 rsyslog-8.38.0nb10 python38-3.8.6nb1
  postfix-3.5.10 pkgin-20.12.1 pkg_install-20201218 openssl-1.1.1l
  libssh2-1.9.0nb1 libarchive-3.4.3 mozilla-rootcerts-1.0.20201102
  openldap-client-2.4.56 http-parser-2.9.4 npm-6.14.11 nodejs-14.16.1
  nghttp2-1.42.0nb1 curl-7.75.0
74M to download

wget-1.20.3nb10.tgz                     100% 1244KB   1.2MB/s   00:00
sudo-1.9.6p1.tgz                        100% 1866KB   1.8MB/s   00:00
rsyslog-8.38.0nb10.tgz                  100% 1165KB   1.1MB/s   00:00
python38-3.8.6nb1.tgz                   100%   27MB  27.3MB/s   00:00
postfix-3.5.10.tgz                      100% 2175KB   2.1MB/s   00:00
pkgin-20.12.1.tgz                       100%   98KB  98.4KB/s   00:00
pkg_install-20201218.tgz                100% 9201KB   9.0MB/s   00:00
openssl-1.1.1l.tgz                      100% 6488KB   6.3MB/s   00:00
libssh2-1.9.0nb1.tgz                    100%  389KB 389.0KB/s   00:00
libarchive-3.4.3.tgz                    100%  979KB 979.3KB/s   00:00
mozilla-rootcerts-1.0.20201102.tgz      100%  573KB 573.1KB/s   00:00
openldap-client-2.4.56.tgz              100% 1438KB   1.4MB/s   00:00
http-parser-2.9.4.tgz                   100%   48KB  47.8KB/s   00:00
npm-6.14.11.tgz                         100% 5352KB   5.2MB/s   00:00
nodejs-14.16.1.tgz                      100%   14MB  14.2MB/s   00:01
nghttp2-1.42.0nb1.tgz                   100%  296KB 295.9KB/s   00:00
curl-7.75.0.tgz                         100% 1664KB   1.6MB/s   00:00

real    0m1.322s
user    0m0.524s
sys     0m0.342s

That's more like it! We can see that package downloads were served from the cache as well with the HIT lines in our logs rather than MISS lines.

Apt (Ubuntu)

The apt package management system is responsible for managing software packages in Ubuntu based Linux distributions. Like most distribution software package managers, it downloads, validates and installs in that order.

DNS Records

Ensure that the following DNS records are configured:

  • apt.ubuntu.cache.ewellnet points to your cache server.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/ubuntu.cache:

upstream archive.ubuntu.com {
  server archive.ubuntu.com;
  keepalive 2;
}

server {
  listen [::];
  server_name apt.ubuntu.cache apt.ubuntu.cache.ewellnet;

  location ^~ deb$ {
    proxy_pass http://archive.ubuntu.com;
  }

  location / {
    proxy_pass http://archive.ubuntu.com;
    proxy_cache_valid any 1h;
  }
}

This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summary will be revalidated every hour.

Refresh Nginx to enable apt package caching:

# svcadm refresh nginx

Client Configuration

Ubuntu clients need to be configured to make use of the local cache. This can be done by altering the repository URLs within the configuration of each client:

/etc/apt/sources.list:

...
deb http://apt.ubuntu.cache/ubuntu/ focal main restricted
deb http://apt.ubuntu.cache/ubuntu/ focal-updates main restricted
deb http://apt.ubuntu.cache/ubuntu/ focal universe
deb http://apt.ubuntu.cache/ubuntu/ focal-updates universe
deb http://apt.ubuntu.cache/ubuntu/ focal multiverse
deb http://apt.ubuntu.cache/ubuntu/ focal-updates multiverse
deb http://apt.ubuntu.cache/ubuntu/ focal-backports main restricted universe multiverse
deb http://apt.ubuntu.cache/ubuntu/ focal-security main restricted
deb http://apt.ubuntu.cache/ubuntu/ focal-security universe
deb http://apt.ubuntu.cache/ubuntu/ focal-security multiverse

Basically change every reference from archive.ubuntu.com or security.ubuntu.com to apt.ubuntu.cache. Run apt update to confirm that it's able to acquire the repository package summaries and enjoy!

Apt (Debian)

Like Ubuntu, the apt package management system is responsible for managing software packages in Debian based Linux distributions. Like most distribution software package managers, it downloads, validates and installs in that order.

DNS Records

Ensure that the following DNS records are configured:

  • apt.debian.cache.ewellnet points to your cache server.
  • security.debian.cache.ewellnet points to your cache server. Note that unlike Ubuntu, Debian does handle security updates separately.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/debian.cache:

upstream cdn-fastly.deb.debian.org {
  server cdn-fastly.deb.debian.org;
  keepalive 2;
}

server {
  listen [::];
  server_name apt.debian.cache apt.debian.cache.ewellnet;

  location ^~ deb$ {
    proxy_pass http://cdn-fastly.deb.debian.org;
  }

  location / {
    proxy_pass http://cdn-fastly.deb.debian.org;
    proxy_cache_valid any 1h;
  }
}

upstream security.debian.org {
  server security.debian.org;
  keepalive 2;
}

server {
  listen [::];
  server_name security.debian.cache security.debian.cache.ewellnet;

  location ^~ deb$ {
    proxy_pass http://security.debian.org;
  }

  location / {
    proxy_pass http://security.debian.org;
    proxy_cache_valid any 1h;
  }
}

This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summaries will be revalidated every hour.

Refresh Nginx to enable apt package caching:

# svcadm refresh nginx

Client Configuration

Debian clients need to be configured to make use of the local cache. This can be done by altering the repository URLs within the configuration of each client:

/etc/apt/sources.list:

deb http://apt.debian.cache/debian stretch main
deb-src http://apt.debian.cache/debian stretch main

deb http://apt.debian.cache/debian stretch-updates main
deb-src http://apt.debian.cache/debian stretch-updates main

deb http://security.debian.cache/ stretch/updates  main
deb-src http://security.debian.cache/ stretch/updates  main

I could keep going with examples of apt caching on Debian based distributions, but it's all pretty similar, configure Nginx to cache the upstream repository, then configure the client system to access Nginx through a local domain name. This work about the same with Kali, MX, Pop!OS, basically anything.

With that out of the way, lets explore some package managers from different distributions.

DNF (CentOS)

DNF is the updated version of YUM, the default package management system used by CentOS and derivative Linux distributions.

DNS Records

Ensure that the following DNS records are configured:

  • yum.centos.cache.ewellnet points to your cache server.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/centos.cache:

upstream mirror.centos.org {
  server mirror.centos.org;
  keepalive 2;
}

server {
  listen [::];
  server_name yum.centos.cache yum.centos.cache.ewellnet;

  location ^~ rpm$ {
    proxy_pass http://mirror.centos.org;
  }

  location / {
    proxy_pass http://mirror.centos.org;
    proxy_cache_valid any 1h;
  }
}

This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summaries will be revalidated every hour.

Refresh Nginx to enable yum and dnf package caching:

# svcadm refresh nginx

Client Configuration

CentOS clients need to be configured to make use of the local cache. This can be done by altering the repository URLs within the configuration of each client. There are quite a few files involved, nearly everything in /etc/yum.repos.d:

/etc/yum.repos.d/*:

[appstream]
name=CentOS Linux $releasever - AppStream
baseurl=http://yum.centos.cache/$contentdir/$releasever/AppStream/$basearch/os/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial

References to mirrorlist should be removed, instead preferring baseurl. Also, don't change these settings for Debuginfo or Sources repos, as we want those to bypass our cache.

Once you're done, run dnf update to confirm everything's working correctly.

XBPS (Void)

Void Linux uses the XBPS package manager. As before we're going to configure DNS, the caching and then each client.

DNS Records

Ensure that the following DNS records are configured:

  • xbps.void.cache.ewellnet points to your cache server.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/void.cache:

upstream alpha.de.repo.voidlinux.org {
  server alpha.de.repo.voidlinux.org:443;
  keepalive 2;
}

server {
  listen [::];
  server_name xbps.void.cache xbps.void.cache.ewellnet;

  location ^~ repodata$ {
    proxy_pass https://alpha.de.repo.voidlinux.org;
    proxy_cache_valid any 1h;
  }

  location / {
    proxy_pass https://alpha.de.repo.voidlinux.org;
  }
}

This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summaries will be revalidated every hour.

Refresh Nginx to enable xbps package caching:

# svcadm refresh nginx

Client Configuration

Void Linux clients need to be configured to make use of the local cache. This can be done by setting a local repository URL within the configuration of each client:

/usr/share/xbps.d/00-repository-main.conf:

repository=http://xbps.void.cache/current

Once this is set, use xbps-install to ensure that everything is working correctly:

# xbps-install -Su
[*] Updating repository `http://xbps.void.cache/current/x86_64-repodata' ...

Node.js Package Manager (npm)

Other package managers can be configured to make use of a local package cache as well. For instance, the Node.js Package Manager utility (npm).

DNS Records

Ensure that the following DNS record is configured:

  • npm.cache.ewellnet points to your cache server.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/npm.cache:

upstream registry.npmjs.org {
  server registry.npmjs.org:443;
  keepalive 2;
}

server {
  listen [::];
  server_name npm.cache npm.cache.ewellnet;

  location / {
    proxy_pass https://registry.npmjs.org;
    proxy_cache_valid any 1h;
  }
}

This approach is revalidation heavy, but unfortunately that's the best we're going to get out of npm, due to how their repository is structured and queried.

Refresh Nginx to enable npm package caching:

# svcadm refresh nginx

Client Configuration

Set the local registry using the npm command-line utility:

# npm set registry http://npm.cache/

And now npm will install packages through your cache.

Python Package Index (pip)

The Python Package Index command-line utility (pip) can also be configured to use a cache. This is a bit more involved than previous as pip normally accesses two different base URIs, https://pipi.org/simple for the repository information which links directly to files normally hosted at https://files.pythonhosted.org/.

However, through clever use of the Nginx ngx_http_sub_module, we should be able to make this work through a single base URI. Note, this module is not currently compiled by default in the build of Nginx distributed via SmartOS pkgsrc, and will need to be custom built to enable this functionality for now.

DNS Records

Ensure that the following DNS record is configured:

  • pip.cache.ewellnet points to your cache server.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/pip.cache:

upstream pypi.org {
  server pypi.org:443;
  keepalive 2;
}

upstream files.pythonhosted.org {
  server files.pythonhosted.org:443;
  keepalive 2;
}

server {
  listen [::];
  server_name pip.cache pip.cache.ewellnet;

  location / {
    proxy_pass https://pypi.org;
    proxy_cache_valid 200 301 302 1h;

    sub_filter_once off;
    sub_filter "https://files.pythonhosted.org" "http://pip.cache";
  }

  location /packages {
    proxy_pass https://files.pythonhosted.org;
  }
}

Refresh Nginx to enable pip package caching:

# svcadm refresh nginx

Client Configuration

Configure clients to use the cache registry by using the pip command-line utility:

# pip config set global.index-url http://pip.cache/simple
Writing to /root/.config/pip/pip.conf
# pip config set global.trusted-host pip.cache
Writing to /root/.config/pip/pip.conf

And now pip will install packages through your cache. You can also set this globally if you prefer, as an exercise I leave up to you.

RubyGems (gem)

The RubyGems package management command-line utility (gem) can also be configured to use a cache.

DNS Records

Ensure that the following DNS record is configured:

  • rubygems.cache.ewellnet points to your cache server.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/rubygems.cache:

upstream rubygems.org {
  server rubygems.org:443;
  keepalive 2;
}

server {
  listen [::];
  server_name rubygems.cache rubygems.cache.ewellnet;

  location /^~ gem$ {
    proxy_pass https://rubygems.org;
  }

  location / {
    proxy_pass https://rubygems.org;
    proxy_cache_valid 200 301 302 1h;
  }
}

Refresh Nginx to enable gem package caching:

# svcadm refresh nginx

Client Configuration

Set the local sources using the gem command-line utility:

# gem sources --add http://rubygems.cache/
http://rubygems.cache/ added to sources
# gem sources --remove https://rubygems.org/
https://rubygems.org/ removed from sources

And now gem will install packages through your cache.

As should be illustrated by now, many different package management systems can be tweaked to interface to their upstream repositories through a cache. Additional examples of package managers that should work would be Cargo, Hex, and Go, the Package managers for Rust, Erlang/Elixir and Go, and while it would be fun to walk through those and stand them up here as examples, I don't use those languages enough to justify the work. Yet.

Steam

A now classic example of using Nginx to cache software resources like this treads directly into the operational space of Lancache, which is configuring a Steam cache.

Steam has gotten a lot nicer to use with caching in the last few years, but I'd still qualify this section as experimental, as it doesn't yield the performance that I'd like to see out of a steam cache. I will likely be revising this in the near future, but will need better visibility into cache performance that will be available at a later time.

DNS Records

Ensure that the following DNS record is configured:

  • lancache.steamcontent.com points to your cache server.

Steam is nice enough to recognize when this record points to a private routing prefix and will redirect all resource requests to this host. Very nice.

Cache Configuration

Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:

/opt/local/etc/nginx/cache/steam.cache:

upstream steam-upstream {
  server cache1-sea1.steamcontent.com;
  server cache2-sea1.steamcontent.com;
  server cache3-sea1.steamcontent.com;
  server cache4-sea1.steamcontent.com;
  server cache1-lax1.steamcontent.com;
  server cache2-lax1.steamcontent.com;
  server cache3-lax1.steamcontent.com;
  server cache4-lax1.steamcontent.com;
  server cache5-lax1.steamcontent.com;
  server cache6-lax1.steamcontent.com;
  keepalive 32;
}

server {
  listen [::];
  server_name lancache.steamcontent.com *.steamcontent.com;
  slice 8m;

  proxy_cache_key lancache.steamcontent.com$uri$is_args$args$slice_range;
  proxy_set_header Range $slice_range;

  location / {
    proxy_pass http://steam-upstream;
  }
}

In my case, I'm using upstream caches located in Seattle and Los Angeles, the two closest locations.

Refresh Nginx to enable Steam caching:

# svcadm refresh nginx

Client Configuration

No client configuration is required. Steam will automatically recognize and make use of this cache.

Cache Partitioning

While all of the above examples use a single shared cache, an approach I prefer, you can also set independent caches per service.

First, disable Nginx.

# svcadm mark maintenance nginx

Create additional ZFS datasets per cache that you'd like to create:

# zfs create -o quota=200G -o mountpoint=/var/www/cache-2 zones/<UUID>/data/cache-2

Register the cache paths in nginx under the http context. Note that the size has been adjusted to reflect the size set for the ZFS dataset:

/opt/local/etc/nginx/nginx.conf:

...
proxy_cache_path /var/www/cache-2 levels=2:2 use_temp_path=on keys_zone=second_cache:128m inactive=4y max_size=190G;
...

Adjust the cache configuration to make use of that cache instead of default_cache, for example with Ubuntu:

/opt/local/etc/nginx/cache/ubuntu.cache:

upstream archive.ubuntu.com {
  server archive.ubuntu.com;
  keepalive 2;
}

server {
  listen [::];
  server_name apt.ubuntu.cache apt.ubuntu.cache.ewellnet;
  proxy_cache second_cache;

  location ^~ deb$ {
    proxy_pass http://archive.ubuntu.com;
  }

  location / {
    proxy_pass http://archive.ubuntu.com;
    proxy_cache_valid 200 301 302 1h;
  }
}

Once you're done, restart Nginx and enjoy having separate caches:

# svcadm clear nginx

Conclusion

While I'm generally happy with the results of this project, there's clearly room for improvement, specifically around Steam and potentially additional game service caching.

Digging further into Lancache and investigating what problems they experienced and how they overcame them is probably the best move from here, as well as investing further into visibility tools to determine how and why Nginx is slowing down requests instead of accelerating them.

But for everything else, this is definitely a good first step.