Package Caching on SmartOS
If you use package management software commonly across two or more systems, network package caching may be of benefit to you. In this article, we will setup a package caching solution to service multiple package management systems simultaneously.
Methodology
While caching packages locally in the file system is already common practice with most package management systems, it's difficult to share this local cache with other nearby systems without possible security implications and things getting prohibitively complicated.
Instead, redirecting requests to a local-network intermediary that provides the same interface as the original repository while also managing its own cache is the preferred solution: Most package management systems use HTTP or HTTPS and refer to their repositories via domain name or URL, and many can be reconfigured to access their repositories through alternate means. In cases where this is not trivial, split-horizon DNS records can be used instead.
This technique is used extensively by projects such as Lancache for game management systems but can also easily be applied to other software package management systems such as Pkgsrc, APT, Yum, etc.
Architecture
There will be two independent services involved in implementing this solution: a DNS resolver and a caching HTTP server.
While I will be including notes on the hostnames that need to be configured, the specifics of implementing DNS will be outside the scope of this article. I'm assuming that you have access to your own DNS resolver such as dnsmasq
or powerdns-recursor
and know how to configure it for split-horizon DNS.
While I used to prefer redirecting the original domain names, I now change the tld to cache.ewellnet
instead, for example:
pkgsrc.joyent.com
becomespkgsrc.joyent.cache.ewellnet
.archive.ubuntu.com
becomesarchive.ubuntu.cache.ewellnet
.
While this does require the reconfiguration of each host using the cache, overall it's a cleaner approach and allows hosts to easily bypass the cache if need be. Additionally, my search
directive under /etc/resolv.conf
is set to ewellnet
, so I can use the relatively shorter pkgsrc.joyent.cache
or archive.ubuntu.cache
instead.
For our caching HTTP server, we will be using Nginx within a single zone. It's incredibly high performance, easy to configure, and does its job very efficiently. This should be perfectly performant for most situations, but may be unsuitable in extreme circumstances. This can be remedied by utilizing a cache cluster, which is well outside the scope of today's article.
I will be using a common cache optimized configuration with additional configuration files for each supported package management system. Additionally for this article, I will setup a common package cache for all management systems, but the reader can just as easily partition their caches for sets of management systems.
It is also best that the cache zone be dual-stack. This benefits any IPv6 only hosts on your network that may otherwise not have access to some repositories.
Zone Manifest
We want to ensure that the zone containing the cache will have enough processing power and memory to not be starved, and enough storage space to handle all of the packages you would like to cache.
We will be using the following for our example:
{
"image_uuid": "1d05e788-5409-11eb-b12f-037bd7fee4ee",
"brand": "joyent",
"alias": "cache",
"hostname": "cache",
"dns_domain": "ewellnet",
"cpu_cap": 200,
"max_physical_memory": 256,
"quota": 1024,
"delegate_dataset": true,
"resolvers": [ "172.22.1.97" ],
"nics": [{
"nic_tag": "external0",
"ips": [ "172.22.1.98/27", "addrconf", "2001:470::98/64" ],
"gateways": [ "172.22.1.97" ],
"primary": true
}]
}
Create it and login.
# vmadm create -f cache.json
Successfully created VM 49ee67da-9e3c-c20a-d925-aa2aa284f95d
# zlogin 49ee67da-9e3c-c20a-d925-aa2aa284f95d
After doing any zone cleanup that you prefer, ensure that a dataset exists to handle your cache. I will often times set a quota to ensure a hard upper limit to size slightly beyond the limit that will be set later in Nginx.
# zfs create -o quota=982G -o mountpoint=/var/www/cache zones/<UUID>/data/cache
Installing & Configuring Nginx
Next, we're going to install Nginx.
# pkgin -y install nginx
Clear out all of the Nginx configuration files; we'll be doing something very specific.
# rm -rv /opt/local/etc/nginx/*
Instead of the default nginx.conf
, we'll be using this one, optimized for caching:
/opt/local/etc/nginx/nginx.conf:
user www www;
worker_processes 2;
events { worker_connections 1024; multi_accept on; }
http {
default_type application/octet-stream;
allow 172.22.1.0/24;
allow 2001:470::/48;
deny all;
server_tokens off;
tcp_nopush on;
sendfile on;
gzip on;
proxy_buffering on;
proxy_buffers 32 8k;
proxy_cache default_cache;
proxy_cache_lock on;
proxy_cache_lock_age 5m;
proxy_cache_lock_timeout 30s;
proxy_cache_path /var/www/cache levels=2:2 use_temp_path=on keys_zone=default_cache:128m inactive=4y max_size=980G;
proxy_cache_revalidate on;
proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504 http_403 http_404 http_429;
proxy_cache_valid 1d;
proxy_cache_valid any 1m;
proxy_http_version 1.1;
proxy_ignore_headers X-Accel-Redirect X-Accel-Expires X-Accel-Limit-Rate X-Accel-Buffering X-Accel-Charset Expires Cache-Control Vary;
proxy_temp_path /var/www/cache/tmp 1;
proxy_ssl_ciphers EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH;
proxy_ssl_protocols TLSv1.2;
proxy_ssl_server_name on;
proxy_ssl_session_reuse on;
proxy_ssl_verify_depth 4;
log_format cache '$remote_addr - $remote_user [$time_local] $status $upstream_cache_status $server_name $request_time [$connection:$connection_requests] "$request_method $scheme://$http_host$request_uri $server_protocol" $body_bytes_sent "$http_referer" "$http_user_agent"';
access_log /var/log/nginx/cache.log cache;
server {
listen [::] default_server ipv6only=off;
server_name default;
return 421;
}
include cache/*.cache;
}
A quick breakdown of configuration parameters that you will want to tweak:
worker_processes
will limit Nginx to spawning that number of worker processes, this is important on compute nodes with high core counts, and should match yourcpu_cap/100
. In my case, with acpu_cap
of 200, this value should be2
.- The
allow
anddeny
directives will allow you to whitelist or blacklist IP prefixes. Useallow
to whitelist your local network prefixes and thendeny all
to prevent all others from accessing your cache. In my case, my local network prefixes are172.22.1.0/24
and2001:470::/48
.
At this point you can enable Nginx to confirm it's happy with its current configuration before adding specific management systems to the cache from the below sections:
# svcadm enable nginx
If you'd like to take a moment here, I recommend reading the Nginx documentation on proxy directives for a more complete understanding of what all has been specified here.
Also, if you want to make full use of all of the below sub-sections of this article, I recommend recompiling Nginx from a build host with sub
and cache-purge
enabled.
Package Caching
This section will start by demonstrating how to configure pkgsrc
package caching and then illustrate the performance differences of caching vs not caching. It will then list additional configured package caching configurations for other operating systems and environments.
Each package management description will include any local changes that need to be made to the system being configured for caching, the DNS records that need to be configured along with the relevant Nginx configuration files. Please replace any references to ewellnet
with your site's local domain name, or just omit that domain name entirely, your choice.
I also recommend keeping a terminal open to tail /var/log/nginx/cache.log
while configuring this, to monitor for cache misses, hits and revalidations.
Pkgsrc (SmartOS)
The pkgsrc
binary package manager is responsible for managing software packages in SmartOS zones. Like most distribution software package managers, it downloads, validates and installs in that order.
Lets get an idea of what sort of performance we can expect out of non-cached pkgsrc. Testing pkgin update
in a freshly provisioned zone gives us the following results:
# time pkgin update
reading local summary...
processing local summary...
processing remote summary (https://pkgsrc.joyent.com/packages/SmartOS/2020Q4/x86_64/All)...
pkg_summary.xz 100% 2374KB 791.4KB/s 00:03
real 0m6.941s
user 0m3.294s
sys 0m0.318s
That's not bad. Lets see how long it'll take to download a full upgrade:
# pkgin clean
# time pkgin -dy upgrade
calculating dependencies...done.
17 packages to download:
wget-1.20.3nb10 sudo-1.9.6p1 rsyslog-8.38.0nb10 python38-3.8.6nb1
postfix-3.5.10 pkgin-20.12.1 pkg_install-20201218 openssl-1.1.1l
libssh2-1.9.0nb1 libarchive-3.4.3 mozilla-rootcerts-1.0.20201102
openldap-client-2.4.56 http-parser-2.9.4 npm-6.14.11 nodejs-14.16.1
nghttp2-1.42.0nb1 curl-7.75.0
74M to download
wget-1.20.3nb10.tgz 100% 1244KB 1.2MB/s 00:00
sudo-1.9.6p1.tgz 100% 1866KB 1.8MB/s 00:01
rsyslog-8.38.0nb10.tgz 100% 1165KB 1.1MB/s 00:01
python38-3.8.6nb1.tgz 100% 27MB 3.9MB/s 00:07
postfix-3.5.10.tgz 100% 2175KB 1.1MB/s 00:02
pkgin-20.12.1.tgz 100% 98KB 98.4KB/s 00:01
pkg_install-20201218.tgz 100% 9201KB 3.0MB/s 00:03
openssl-1.1.1l.tgz 100% 6488KB 2.1MB/s 00:03
libssh2-1.9.0nb1.tgz 100% 389KB 389.0KB/s 00:01
libarchive-3.4.3.tgz 100% 979KB 979.3KB/s 00:01
mozilla-rootcerts-1.0.20201102.tgz 100% 573KB 573.1KB/s 00:00
openldap-client-2.4.56.tgz 100% 1438KB 1.4MB/s 00:00
http-parser-2.9.4.tgz 100% 48KB 47.8KB/s 00:00
npm-6.14.11.tgz 100% 5352KB 2.6MB/s 00:02
nodejs-14.16.1.tgz 100% 14MB 3.6MB/s 00:04
nghttp2-1.42.0nb1.tgz 100% 296KB 295.9KB/s 00:01
curl-7.75.0.tgz 100% 1664KB 1.6MB/s 00:01
real 0m40.083s
user 0m0.630s
sys 0m0.306s
40 seconds. Not horrible, but surely we can improve upon this.
DNS Records
Ensure that the following DNS records are configured:
pkgsrc.joyent.cache.ewellnet
points to your cache server.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/joyent.cache:
upstream pkgsrc.joyent.com {
server pkgsrc.joyent.com:443;
keepalive 2;
}
server {
listen [::];
server_name pkgsrc.joyent.cache pkgsrc.joyent.cache.ewellnet;
location ^~ pkg_summary.(bz2|gz|xz)$ {
proxy_pass https://pkgsrc.joyent.com;
proxy_cache_valid any 1h;
}
location / {
proxy_pass https://pkgsrc.joyent.com;
}
}
This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summary will be revalidated every hour.
Refresh Nginx to enable pkgsrc
package caching:
# svcadm refresh nginx
Client Configuration
Pkgsrc clients need to be configured to make use of the local cache. This can be done by altering the repository URL within the configuration of each client:
/opt/local/etc/pkgin/repositories.conf:
...
http://pkgsrc.joyent.cache/packages/SmartOS/2020Q4/x86_64/All
Run pkgin update
to confirm that it's able to acquire the repository package summary:
# time pkgin update
cleaning database from https://pkgsrc.joyent.com/packages/SmartOS/2020Q4/x86_64/All entries...
reading local summary...
processing local summary...
processing remote summary (http://pkgsrc.joyent.cache/packages/SmartOS/2020Q4/x86_64/All)...
pkg_summary.xz 100% 2374KB 593.6KB/s 00:04
real 0m7.952s
user 0m4.332s
sys 0m0.581s
Note that the time is slightly longer, this is due to the clearing out of the previous database and due to the fact that we didn't have the summary cached. We can also now re-test downloading an upgrade from the cache:
# pkgin clean
# time pkgin -dy upgrade
calculating dependencies...done.
17 packages to download:
wget-1.20.3nb10 sudo-1.9.6p1 rsyslog-8.38.0nb10 python38-3.8.6nb1
postfix-3.5.10 pkgin-20.12.1 pkg_install-20201218 openssl-1.1.1l
libssh2-1.9.0nb1 libarchive-3.4.3 mozilla-rootcerts-1.0.20201102
openldap-client-2.4.56 http-parser-2.9.4 npm-6.14.11 nodejs-14.16.1
nghttp2-1.42.0nb1 curl-7.75.0
74M to download
wget-1.20.3nb10.tgz 100% 1244KB 1.2MB/s 00:01
sudo-1.9.6p1.tgz 100% 1866KB 933.2KB/s 00:02
rsyslog-8.38.0nb10.tgz 100% 1165KB 1.1MB/s 00:01
python38-3.8.6nb1.tgz 100% 27MB 3.9MB/s 00:07
postfix-3.5.10.tgz 100% 2175KB 2.1MB/s 00:01
pkgin-20.12.1.tgz 100% 98KB 98.4KB/s 00:00
pkg_install-20201218.tgz 100% 9201KB 3.0MB/s 00:03
openssl-1.1.1l.tgz 100% 6488KB 2.1MB/s 00:03
libssh2-1.9.0nb1.tgz 100% 389KB 389.0KB/s 00:00
libarchive-3.4.3.tgz 100% 979KB 979.3KB/s 00:01
mozilla-rootcerts-1.0.20201102.tgz 100% 573KB 573.1KB/s 00:01
openldap-client-2.4.56.tgz 100% 1438KB 1.4MB/s 00:01
http-parser-2.9.4.tgz 100% 48KB 47.8KB/s 00:00
npm-6.14.11.tgz 100% 5352KB 1.7MB/s 00:03
nodejs-14.16.1.tgz 100% 14MB 4.7MB/s 00:03
nghttp2-1.42.0nb1.tgz 100% 296KB 295.9KB/s 00:01
curl-7.75.0.tgz 100% 1664KB 1.6MB/s 00:01
real 0m32.785s
user 0m0.553s
sys 0m0.370s
Not too much quicker, but again, none of this data had been cached yet. Switch the source repository back, re-update, and switch back to the cache again to test it's real performance:
-- Switched back to upstream
# pkgin update
-- Switched back to cache
# time pkgin update
cleaning database from https://pkgsrc.joyent.com/packages/SmartOS/2020Q4/x86_64/All entries...
reading local summary...
processing local summary...
processing remote summary (http://pkgsrc.joyent.cache/packages/SmartOS/2020Q4/x86_64/All)...
pkg_summary.xz 100% 2374KB 791.4KB/s 00:03
real 0m6.833s
user 0m3.952s
sys 0m0.555s
Not a whole lot to be excited about with pkgsrc update
. Lets check on upgrade:
# pkgin clean
# time pkgin -dy upgrade
calculating dependencies...done.
17 packages to download:
wget-1.20.3nb10 sudo-1.9.6p1 rsyslog-8.38.0nb10 python38-3.8.6nb1
postfix-3.5.10 pkgin-20.12.1 pkg_install-20201218 openssl-1.1.1l
libssh2-1.9.0nb1 libarchive-3.4.3 mozilla-rootcerts-1.0.20201102
openldap-client-2.4.56 http-parser-2.9.4 npm-6.14.11 nodejs-14.16.1
nghttp2-1.42.0nb1 curl-7.75.0
74M to download
wget-1.20.3nb10.tgz 100% 1244KB 1.2MB/s 00:00
sudo-1.9.6p1.tgz 100% 1866KB 1.8MB/s 00:00
rsyslog-8.38.0nb10.tgz 100% 1165KB 1.1MB/s 00:00
python38-3.8.6nb1.tgz 100% 27MB 27.3MB/s 00:00
postfix-3.5.10.tgz 100% 2175KB 2.1MB/s 00:00
pkgin-20.12.1.tgz 100% 98KB 98.4KB/s 00:00
pkg_install-20201218.tgz 100% 9201KB 9.0MB/s 00:00
openssl-1.1.1l.tgz 100% 6488KB 6.3MB/s 00:00
libssh2-1.9.0nb1.tgz 100% 389KB 389.0KB/s 00:00
libarchive-3.4.3.tgz 100% 979KB 979.3KB/s 00:00
mozilla-rootcerts-1.0.20201102.tgz 100% 573KB 573.1KB/s 00:00
openldap-client-2.4.56.tgz 100% 1438KB 1.4MB/s 00:00
http-parser-2.9.4.tgz 100% 48KB 47.8KB/s 00:00
npm-6.14.11.tgz 100% 5352KB 5.2MB/s 00:00
nodejs-14.16.1.tgz 100% 14MB 14.2MB/s 00:01
nghttp2-1.42.0nb1.tgz 100% 296KB 295.9KB/s 00:00
curl-7.75.0.tgz 100% 1664KB 1.6MB/s 00:00
real 0m1.322s
user 0m0.524s
sys 0m0.342s
That's more like it! We can see that package downloads were served from the cache as well with the HIT
lines in our logs rather than MISS
lines.
Apt (Ubuntu)
The apt
package management system is responsible for managing software packages in Ubuntu based Linux distributions. Like most distribution software package managers, it downloads, validates and installs in that order.
DNS Records
Ensure that the following DNS records are configured:
apt.ubuntu.cache.ewellnet
points to your cache server.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/ubuntu.cache:
upstream archive.ubuntu.com {
server archive.ubuntu.com;
keepalive 2;
}
server {
listen [::];
server_name apt.ubuntu.cache apt.ubuntu.cache.ewellnet;
location ^~ deb$ {
proxy_pass http://archive.ubuntu.com;
}
location / {
proxy_pass http://archive.ubuntu.com;
proxy_cache_valid any 1h;
}
}
This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summary will be revalidated every hour.
Refresh Nginx to enable apt
package caching:
# svcadm refresh nginx
Client Configuration
Ubuntu clients need to be configured to make use of the local cache. This can be done by altering the repository URLs within the configuration of each client:
/etc/apt/sources.list:
...
deb http://apt.ubuntu.cache/ubuntu/ focal main restricted
deb http://apt.ubuntu.cache/ubuntu/ focal-updates main restricted
deb http://apt.ubuntu.cache/ubuntu/ focal universe
deb http://apt.ubuntu.cache/ubuntu/ focal-updates universe
deb http://apt.ubuntu.cache/ubuntu/ focal multiverse
deb http://apt.ubuntu.cache/ubuntu/ focal-updates multiverse
deb http://apt.ubuntu.cache/ubuntu/ focal-backports main restricted universe multiverse
deb http://apt.ubuntu.cache/ubuntu/ focal-security main restricted
deb http://apt.ubuntu.cache/ubuntu/ focal-security universe
deb http://apt.ubuntu.cache/ubuntu/ focal-security multiverse
Basically change every reference from archive.ubuntu.com
or security.ubuntu.com
to apt.ubuntu.cache
. Run apt update
to confirm that it's able to acquire the repository package summaries and enjoy!
Apt (Debian)
Like Ubuntu, the apt
package management system is responsible for managing software packages in Debian based Linux distributions. Like most distribution software package managers, it downloads, validates and installs in that order.
DNS Records
Ensure that the following DNS records are configured:
apt.debian.cache.ewellnet
points to your cache server.security.debian.cache.ewellnet
points to your cache server. Note that unlike Ubuntu, Debian does handle security updates separately.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/debian.cache:
upstream cdn-fastly.deb.debian.org {
server cdn-fastly.deb.debian.org;
keepalive 2;
}
server {
listen [::];
server_name apt.debian.cache apt.debian.cache.ewellnet;
location ^~ deb$ {
proxy_pass http://cdn-fastly.deb.debian.org;
}
location / {
proxy_pass http://cdn-fastly.deb.debian.org;
proxy_cache_valid any 1h;
}
}
upstream security.debian.org {
server security.debian.org;
keepalive 2;
}
server {
listen [::];
server_name security.debian.cache security.debian.cache.ewellnet;
location ^~ deb$ {
proxy_pass http://security.debian.org;
}
location / {
proxy_pass http://security.debian.org;
proxy_cache_valid any 1h;
}
}
This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summaries will be revalidated every hour.
Refresh Nginx to enable apt
package caching:
# svcadm refresh nginx
Client Configuration
Debian clients need to be configured to make use of the local cache. This can be done by altering the repository URLs within the configuration of each client:
/etc/apt/sources.list:
deb http://apt.debian.cache/debian stretch main
deb-src http://apt.debian.cache/debian stretch main
deb http://apt.debian.cache/debian stretch-updates main
deb-src http://apt.debian.cache/debian stretch-updates main
deb http://security.debian.cache/ stretch/updates main
deb-src http://security.debian.cache/ stretch/updates main
I could keep going with examples of apt
caching on Debian based distributions, but it's all pretty similar, configure Nginx to cache the upstream repository, then configure the client system to access Nginx through a local domain name. This work about the same with Kali, MX, Pop!OS, basically anything.
With that out of the way, lets explore some package managers from different distributions.
DNF (CentOS)
DNF is the updated version of YUM, the default package management system used by CentOS and derivative Linux distributions.
DNS Records
Ensure that the following DNS records are configured:
yum.centos.cache.ewellnet
points to your cache server.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/centos.cache:
upstream mirror.centos.org {
server mirror.centos.org;
keepalive 2;
}
server {
listen [::];
server_name yum.centos.cache yum.centos.cache.ewellnet;
location ^~ rpm$ {
proxy_pass http://mirror.centos.org;
}
location / {
proxy_pass http://mirror.centos.org;
proxy_cache_valid any 1h;
}
}
This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summaries will be revalidated every hour.
Refresh Nginx to enable yum
and dnf
package caching:
# svcadm refresh nginx
Client Configuration
CentOS clients need to be configured to make use of the local cache. This can be done by altering the repository URLs within the configuration of each client. There are quite a few files involved, nearly everything in /etc/yum.repos.d
:
/etc/yum.repos.d/*:
[appstream]
name=CentOS Linux $releasever - AppStream
baseurl=http://yum.centos.cache/$contentdir/$releasever/AppStream/$basearch/os/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
References to mirrorlist
should be removed, instead preferring baseurl
. Also, don't change these settings for Debuginfo
or Sources
repos, as we want those to bypass our cache.
Once you're done, run dnf update
to confirm everything's working correctly.
XBPS (Void)
Void Linux uses the XBPS package manager. As before we're going to configure DNS, the caching and then each client.
DNS Records
Ensure that the following DNS records are configured:
xbps.void.cache.ewellnet
points to your cache server.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/void.cache:
upstream alpha.de.repo.voidlinux.org {
server alpha.de.repo.voidlinux.org:443;
keepalive 2;
}
server {
listen [::];
server_name xbps.void.cache xbps.void.cache.ewellnet;
location ^~ repodata$ {
proxy_pass https://alpha.de.repo.voidlinux.org;
proxy_cache_valid any 1h;
}
location / {
proxy_pass https://alpha.de.repo.voidlinux.org;
}
}
This configuration ensures that requests for normal packages will only be revalidated after the default duration has passed, but requests for the package summaries will be revalidated every hour.
Refresh Nginx to enable xbps
package caching:
# svcadm refresh nginx
Client Configuration
Void Linux clients need to be configured to make use of the local cache. This can be done by setting a local repository URL within the configuration of each client:
/usr/share/xbps.d/00-repository-main.conf:
repository=http://xbps.void.cache/current
Once this is set, use xbps-install
to ensure that everything is working correctly:
# xbps-install -Su
[*] Updating repository `http://xbps.void.cache/current/x86_64-repodata' ...
Node.js Package Manager (npm)
Other package managers can be configured to make use of a local package cache as well. For instance, the Node.js Package Manager utility (npm
).
DNS Records
Ensure that the following DNS record is configured:
npm.cache.ewellnet
points to your cache server.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/npm.cache:
upstream registry.npmjs.org {
server registry.npmjs.org:443;
keepalive 2;
}
server {
listen [::];
server_name npm.cache npm.cache.ewellnet;
location / {
proxy_pass https://registry.npmjs.org;
proxy_cache_valid any 1h;
}
}
This approach is revalidation heavy, but unfortunately that's the best we're going to get out of npm, due to how their repository is structured and queried.
Refresh Nginx to enable npm
package caching:
# svcadm refresh nginx
Client Configuration
Set the local registry using the npm
command-line utility:
# npm set registry http://npm.cache/
And now npm
will install packages through your cache.
Python Package Index (pip)
The Python Package Index command-line utility (pip
) can also be configured to use a cache. This is a bit more involved than previous as pip normally accesses two different base URIs, https://pipi.org/simple for the repository information which links directly to files normally hosted at https://files.pythonhosted.org/.
However, through clever use of the Nginx ngx_http_sub_module
, we should be able to make this work through a single base URI. Note, this module is not currently compiled by default in the build of Nginx distributed via SmartOS pkgsrc
, and will need to be custom built to enable this functionality for now.
DNS Records
Ensure that the following DNS record is configured:
pip.cache.ewellnet
points to your cache server.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/pip.cache:
upstream pypi.org {
server pypi.org:443;
keepalive 2;
}
upstream files.pythonhosted.org {
server files.pythonhosted.org:443;
keepalive 2;
}
server {
listen [::];
server_name pip.cache pip.cache.ewellnet;
location / {
proxy_pass https://pypi.org;
proxy_cache_valid 200 301 302 1h;
sub_filter_once off;
sub_filter "https://files.pythonhosted.org" "http://pip.cache";
}
location /packages {
proxy_pass https://files.pythonhosted.org;
}
}
Refresh Nginx to enable pip
package caching:
# svcadm refresh nginx
Client Configuration
Configure clients to use the cache registry by using the pip
command-line utility:
# pip config set global.index-url http://pip.cache/simple
Writing to /root/.config/pip/pip.conf
# pip config set global.trusted-host pip.cache
Writing to /root/.config/pip/pip.conf
And now pip
will install packages through your cache. You can also set this globally if you prefer, as an exercise I leave up to you.
RubyGems (gem)
The RubyGems package management command-line utility (gem
) can also be configured to use a cache.
DNS Records
Ensure that the following DNS record is configured:
rubygems.cache.ewellnet
points to your cache server.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/rubygems.cache:
upstream rubygems.org {
server rubygems.org:443;
keepalive 2;
}
server {
listen [::];
server_name rubygems.cache rubygems.cache.ewellnet;
location /^~ gem$ {
proxy_pass https://rubygems.org;
}
location / {
proxy_pass https://rubygems.org;
proxy_cache_valid 200 301 302 1h;
}
}
Refresh Nginx to enable gem
package caching:
# svcadm refresh nginx
Client Configuration
Set the local sources using the gem
command-line utility:
# gem sources --add http://rubygems.cache/
http://rubygems.cache/ added to sources
# gem sources --remove https://rubygems.org/
https://rubygems.org/ removed from sources
And now gem
will install packages through your cache.
As should be illustrated by now, many different package management systems can be tweaked to interface to their upstream repositories through a cache. Additional examples of package managers that should work would be Cargo, Hex, and Go, the Package managers for Rust, Erlang/Elixir and Go, and while it would be fun to walk through those and stand them up here as examples, I don't use those languages enough to justify the work. Yet.
Steam
A now classic example of using Nginx to cache software resources like this treads directly into the operational space of Lancache, which is configuring a Steam cache.
Steam has gotten a lot nicer to use with caching in the last few years, but I'd still qualify this section as experimental, as it doesn't yield the performance that I'd like to see out of a steam cache. I will likely be revising this in the near future, but will need better visibility into cache performance that will be available at a later time.
DNS Records
Ensure that the following DNS record is configured:
lancache.steamcontent.com
points to your cache server.
Steam is nice enough to recognize when this record points to a private routing prefix and will redirect all resource requests to this host. Very nice.
Cache Configuration
Ensure that the following configuration file has been added to your Nginx configuration directory in your cache:
/opt/local/etc/nginx/cache/steam.cache:
upstream steam-upstream {
server cache1-sea1.steamcontent.com;
server cache2-sea1.steamcontent.com;
server cache3-sea1.steamcontent.com;
server cache4-sea1.steamcontent.com;
server cache1-lax1.steamcontent.com;
server cache2-lax1.steamcontent.com;
server cache3-lax1.steamcontent.com;
server cache4-lax1.steamcontent.com;
server cache5-lax1.steamcontent.com;
server cache6-lax1.steamcontent.com;
keepalive 32;
}
server {
listen [::];
server_name lancache.steamcontent.com *.steamcontent.com;
slice 8m;
proxy_cache_key lancache.steamcontent.com$uri$is_args$args$slice_range;
proxy_set_header Range $slice_range;
location / {
proxy_pass http://steam-upstream;
}
}
In my case, I'm using upstream caches located in Seattle and Los Angeles, the two closest locations.
Refresh Nginx to enable Steam caching:
# svcadm refresh nginx
Client Configuration
No client configuration is required. Steam will automatically recognize and make use of this cache.
Cache Partitioning
While all of the above examples use a single shared cache, an approach I prefer, you can also set independent caches per service.
First, disable Nginx.
# svcadm mark maintenance nginx
Create additional ZFS datasets per cache that you'd like to create:
# zfs create -o quota=200G -o mountpoint=/var/www/cache-2 zones/<UUID>/data/cache-2
Register the cache paths in nginx under the http context. Note that the size has been adjusted to reflect the size set for the ZFS dataset:
/opt/local/etc/nginx/nginx.conf:
...
proxy_cache_path /var/www/cache-2 levels=2:2 use_temp_path=on keys_zone=second_cache:128m inactive=4y max_size=190G;
...
Adjust the cache configuration to make use of that cache instead of default_cache
, for example with Ubuntu:
/opt/local/etc/nginx/cache/ubuntu.cache:
upstream archive.ubuntu.com {
server archive.ubuntu.com;
keepalive 2;
}
server {
listen [::];
server_name apt.ubuntu.cache apt.ubuntu.cache.ewellnet;
proxy_cache second_cache;
location ^~ deb$ {
proxy_pass http://archive.ubuntu.com;
}
location / {
proxy_pass http://archive.ubuntu.com;
proxy_cache_valid 200 301 302 1h;
}
}
Once you're done, restart Nginx and enjoy having separate caches:
# svcadm clear nginx
Conclusion
While I'm generally happy with the results of this project, there's clearly room for improvement, specifically around Steam and potentially additional game service caching.
Digging further into Lancache and investigating what problems they experienced and how they overcame them is probably the best move from here, as well as investing further into visibility tools to determine how and why Nginx is slowing down requests instead of accelerating them.
But for everything else, this is definitely a good first step.