necheff.net

The Ultimate Management Network, 2024 Edition

In this article I will provide some high level guidance on how to setup a management network using Wireguard, including some DNS and SSH modifications to account for it. The design is based on the evolution of my network over the past several years and what I consider to be state of the art for 2024.

A short list of things to with a management network:

Support interactive SSH sessions without exposting TCP/22 to the Internet.
Run Ansible to centralize node configuration management.
Funnel data from node exporters to a central dashboard built on Prometheus and Grafana.
Simplify the use of Tang and Clevis for protecting data at rest in remote nodes.

But, the fun and possibilities are endless.

All snippets assume a Debian 12 system and that you are logged in to a root shell.

Rationale or Mo' Computers, Mo' Problems

I have a collection of Debian 12 systems to wrangle. Some are workstations, some are servers. Some are at home, some are at a friend's or family member's home, yet others are in the cloud. Keeping all these systems up to date and configured for a consistent user experiance _is_ a burden. Not to mention doing health checks on each system and being proactive about failing hardware or software.

Common hurdles to overcome:

Configuring port forwarding with a NAT gateway.
Dealing with dynamically allocated globally routable IP addresses.
Exposing SSH to the Internet gets noisy even with a hardened config.

Changing the SSH listen port does nothing to actually harden a system but it sure does make diagnosing problems with legitimate connections far more difficult than it needs to be.
Running Fail2Ban on a single core VPS instance is tantamount to DoS'ing yourself. Besides, have we learned nothing from the Log4Shell debacle?

The WireGuard management network solves all these problems. For a continuous point to point connection, WireGuard only needs to have a "fixed" endpoint address to establish an initial connection, the protocol will then gracefully handle nodes moving _without_ compromising security. Services that don't need a direct Internet connection, like SSH, can be hidden behind the VPN, minimizing attack surface and quieting the logs.

WireGuard Backbone

WireGuard is a VPN that has built on lessons learned from the headaches of IPSec, OpenVPN, and others. It is lightweight, uses modern cryptography, and avoids much of the complexity of prior VPN implementations resulting in saner network configuration. For my management network, I'll use a star/hub configuration with all nodes/peers connected to a single router/gateway that will itself be a peer. WireGuard does support mesh architectures but I have chosen the star for its less complex configuration.

On each peer, including the gateway, generate a key-pair:


# don't want file-perms too loosey-goosey, especially on multi-user systems
~$ umask 0077

~$ mkdir -p /crypt/wireguard
~$ cd /crypt/wireguard
~$ wg genkey > $(hostname).key
~$ wg pubkey < $(hostname).key > $(hostname).pub

Now, on the gateway only, generate a pre-shared key for each peer. Technically, this is an optional step. If a high level explaination of digital key exchange startles you, skip the next paragraph but generate a pre-shared key anyways. As a best practice, you should generate a pre-shared key per peer-connection rather than sharing the same between all peers. In my case, using a star topology, each peer with the exception of the star-hub itself has a single peer-connection.

WireGuard uses Curve25519 as the curve for Elliptic Curve Diffie-Hellman key exchange (ECDH). Hosts connected on a potentially hostile network use a key exchange protocol like ECDH to negotiate a shared key without publicly disclosing the key and without requiring prior knowledge of the key (i.e. sneakerneting the key). WireGuard then uses this key to encrypt data using a symmetric cipher called ChaCha20-Poly1305. Curve25519 is a high-quality elliptic curve that I would trust over most other curves and especially anything in the family of P-256 which was likely tampered with and certainly lacks a compelling cryptanalysis. However, being based around the discrete logarithm problem as all elliptic curves are, it is vulnerable in a post-quantum world with sufficiently large and stable quantum computers. We arn't there yet in 2024, at least not with what is known publicly, but there isn't a remotely noticable computational overhead for beefing up security, this isn't the 1960's anymore where every CPU cycle counts. What the pre-shared key does for WireGuard is makes key exchange resistant to post-quantum attacks. ChaCha20-Poly1305 itself is a highly respected AEAD cipher which is on-par with if not better than 256bit AES in GCM mode. In fact, when executed on hardware without accelerated instructions, ChaCha20 is often _faster_ than AES given equal key sizes.


~$ wg genpsk > ${PEER_NAME}.psk

With keying material generated, it is time to configure WireGuard itself. The following example assumes a three-peer network with the router/hub host name being "star" and the two client peers being "host1" and "host2".

On the "star" host, add the following to /etc/wireguard/wg0.conf and be sure to set a restrictive umask to avoid creating overly permissive file permissions that could leak keys to other users on the system.


[Interface]
PrivateKey = # contents of star.key
ListenPort = 99999 # this is UDP

[Peer]
# host1
PublicKey = # contents of host1.pub
PresharedKey = # contents of host1.psk
AllowedIPs = 10.211.0.3/32 # the IP(s) configured on the VPN interface of host1

[Peer]
# host2
PublicKey = # contents of host2.pub
PresharedKey = # contents of host2.psk
AllowedIPs = 10.211.0.4/32 # the IP(s) configured on the VPN interface of host2

And now each client will need a WireGuard configuration too. For brevity only the configuration of "host1" is shown below. Again, make sure umask is restrictive to avoid allowing other users on the system to view this file which is located at /etc/wireguard/wg0.conf too.


[Interface]
PrivateKey = # contents of host1.key

[Peer]
# star
PublicKey = # contents of star.pub
PresharedKey = # contents of host1.psk
AllowedIPs = 10.211.0.0/24 # we'll allow ALL peers on the VPN to speak to us.
Endpoint = star.example.com:99999
PersistentKeepalive = 20 # keep NAT and other weird networking equipment happy

We also need to create a network interface on each peer and assign the VPN configuration to it. This will basically look the same on each host but for brevity, only "host1" is shown below. The configuration will use a Debian-style ifupdown configuration located at /etc/network/interfaces.d/wg0.


iface wg0 inet static
    address 10.211.0.3/24
    pre-up ip link add $IFACE type wireguard
    pre-up wg setconf $IFACE /etc/wireguard/$IFACE.conf
    post-down ip link del $IFACE

I intentionally did not include an "auto wg0" statement in the interface stanza as this could result in the wg0 interface being brought up before the physical interface connecting me to the Internet is brought up which could cause problems. Instead, I include a "post-up ifup wg0" statement in the interface stanza of my Internet-facing interface to ensure wg0 is only brought up after the rest of my network on the system is brought up.

To start capping off the base WireGuard configuration, we need to setup the firewall on the "star" host. I won't provide a comprehensive ruleset, just enough to get things working if you put them in the right spot of your existing ruleset.


# allow client peers to establish a connection with the router/gateway
~$ iptables -A INPUT -p udp --dport 99999 -j ACCEPT -m comment --comment "WireGuard"

# remember, the "star" host is a router now...
~$ iptables -P FORWARD DROP
~$ iptables -A FORWARD -i wg0 -o wg0 -m conntrack --ctstate INVALID -j DROP -m comment --comment "the usual"
~$ iptables -A FORWARD -i wg0 -o wg0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -m comment --comment "the usual"
~$ iptables -A FORWARD -i wg0 -o wg0 -p icmp --icmp-type 3 -j ACCEPT -m comment --comment "Destination Unreachable"
~$ iptables -A FORWARD -i wg0 -o wg0 -p icmp --icmp-type 8 -j ACCEPT -m comment --comment "Echo Request"
~$ iptables -A FORWARD -i wg0 -o wg0 -p icmp --icmp-type 11 -j ACCEPT -m comment --comment "Time Exceeded"
~$ iptables -A FORWARD -i wg0 -o wg0 -s 10.211.0.0/24 -d 10.211.0.0/24 -p tcp --dport 22 -j ACCEPT -m comment --comment "SSH"
~$ iptables -A FORWARD -i wg0 -o wg0 -s 10.211.0.0/24 -d 10.211.0.0/24 -p tcp --dport 53 -j ACCEPT -m comment --comment "DNS"
~$ iptables -A FORWARD -i wg0 -o wg0 -s 10.211.0.0/24 -d 10.211.0.0/24 -p udp --dport 53 -j ACCEPT -m comment --comment "DNS"
~$ iptables -A FORWARD -i wg0 -j LOG --log-level 6 --log-prefix ":::VPN:FORWARD:::DROP:::" -m comment --comment "WTF"

# don't forget to save the ruleset for next boot! if you have iptables-persistent installed that is.
~$ iptables-save > /etc/iptables/rules.v4

I have included a logging rule to help diagnose any connectivity issues on the management network, but once the dust settles, this should probably be removed.

We don't need to do any additional routing because the static routes that get configured when the wg0 interface is brought up on each peer will get the job done. But we are not done just yet, we need to tell the kernel on the "star" host that we want to allow it to forward traffic. So edit /etc/sysctl.d/local_vpn.conf with the following:


# allow IPv4 traffic to be forwarded
net.ipv4.ip_forward = 1
# not strictly needed, but using a smarter traffic queuing algorithm on a router makes sense
net.core.default_qdisc = fq_codel

All that is left is to apply the changes to the kernel. You can either reboot, or run the following command to apply changes without rebooting.


~$ sysctl -p

At this point, bringing the wg0 interface up on all peers should allow for the flow of IP traffic over the management network. But no one wants to remember IP addresses, the next section touches on DNS.

DNS

This section assumes you already have a DNS server listening on 10.211.0.2, the wg0 address of the star host. So I won't go into detail on how to setup forward and reverse lookup zones. But what I will cover is how to configure the local resolver of each peer so that queries about records within the VPN zones will stay on the VPN, we don't need to publish VPN records in a public DNS directory, and each peer can still do general-purpose DNS queries for Internet access without funneling all DNS traffic over the VPN.

To accomplish this, we'll use dnsmasq and the resolvconf implementation provided by the openresolv package. Install both packages on all peers.

Add the following to /etc/resolvconf.conf on each peer:


resolv_conf=/etc/resolv.conf
name_servers="::1 127.0.0.1"
private_interfaces=wg0 # keeps VPN queries on the VPN
public_interfaces=$PHY_INTERFACE # the name of your Internet-facing NIC
interface_order="lo lo[0-9]* enp[0-9]*s[0-9]* eno[0-9]* wlo[0-9]* wlp[0-9]*s[0-9]*" # ensure physical interfaces are configured before wg0
dnsmasq_conf=/etc/dnsmasq.d/resolvconf_dynamic.conf
dnsmasq_resolv=/run/dnsmasq/resolv.conf

Add the following to /etc/dnsmasq.d/resolvconf.conf on each peer:


interface=lo
# really listen only on lo, don't just listen on every interface and discard non-lo traffic...
bind-interfaces
domain-needed
enable-dbus
resolv-file=/run/dnsmasq/resolv.conf

Add the following to the stanzas for wg0 located in /etc/network/interface.d/wg0 on each peer:


dns-nameservers 10.211.0.2
dns-domain wg.example.com
dns-search wg.example.com

Take a moment to double-check apt properly configured /etc/nsswitch.conf, it should contain a hosts line with the general order:


hosts:        hosts [NOTFOUND=return] dns

And finish things off with


# force resolvconf to reprocess its configuration
~$ resolvconf -u

# have dnsmasq pickup the changes we just made
~$ systemctl restart dnsmasq

Done! Now DNS should be working for a private, VPN-only domain while allowing peers to still browse YouTube or whatever.

SSH

In this last section I will outline how to tighten up OpenSSH with an emphasis on allowing root logins over the VPN network. This will facilitate management tools like Ansible or backup scripts that need root permissions. From there, you can either choose to continue forbidding root logins from the Internet or you can technically just prevent SSH from listening on public interfaces with a firewall policy to enforce this.

The default Debian configuration for sshd uses an Include directive at the very top of /etc/ssh/sshd_config so that local configurations can supercede the Debian maintainer's configuration without requiring manual intervention when applying updates to the OpenSSH package. The Include directive looks like this:


Include /etc/ssh/sshd_config.d/*.conf

On each peer, edit /etc/ssh/sshd_config.d/local.conf to contain the following:


#
# These settings are applied globally to all incoming SSH connections.
#
PasswordAuthentication no
KbdInteractiveAuthentication no
HostbasedAuthentication no
KerberosAuthentication no
GSSAPIAuthentication no

PubkeyAuthentication yes

# Don't allow root logins in general.
PermitRootLogin no

# One-stop-shop for saying "no" to all the *Forwarding options.
DisableForwarding yes

PermitTunnel no

# Force stronger crypto at the expense of supporting older clients.
# Protects against downgrade attacks.
KexAlgorithms sntrup761x25519-sha512@openssh.com
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com

On each peer, edit /etc/ssh/sshd_config.d/mgmt.conf to contain the following:


#
# Match blocks can be used to conditionally apply sshd settings to
# incoming connections. In this case, LocalAddress matches the IP
# the incoming connection was destined for. So this will need to
# be changed for each peer recieving this configuration.
#
# Match blocks are evaluated in a top-down, first-match approch. Other
# blocks that could apply to a connection are ignored.
#
Match LocalAddress 10.211.0.2/32
    # Allow root logins, but only with a key
    PermitRootLogin prohibit-password

    # Non-interactive sessions like scripts don't want to see a Banner.
    Banner none

    # Don't allow forwarding and only allow public key authentication.
    PermitTunnel no
    DisableForwarding yes
    PubkeyAuthentication yes
    PasswordAuthentication no
    KbdInteractiveAuthentication no
    HostbasedAuthentication no
    KerberosAuthentication no
    GSSAPIAuthentication no

Now, before restarting sshd and making the walk of shame to the console because you locked yourself out (could be a long walk in the case of remote server!), test the configuration. sshd supports the -T option to parse the configuration and print the settings it would apply but not actually bind to an interface and start the daemon. sshd also supports the -C option to simulate a number of connetion conditions to that Match blocks can be tested too.


# test the base configuration
~$ sshd -T

# test the management network Match block
~$ sshd -T -C laddr=10.211.0.2

# restart the daemon to apply changes
~$ systemctl restart sshd

That is it for now! This lays the foundation for a low maintainance, low latency, and secure management network that other projects can build upon.