A Wireguard based VPN with user authentication

This post doesn’t describe specific implementation details, because the Intellectual Property for the system described here belongs to my employer. Instead it describes the basic idea behind the Wireguard based VPN I built.

Names and IP addresses referred to here are not the ones used in my implementation.

Wireguard is a good encrypted tunnelling protocol, it runs way down in network driver space, so it’s pretty efficient. But… Running at network driver level means it doesn’t really know about users, it works with devices. It also doesn’t have a concept of “client” and “server”, it just works with peers.

So how do you build a VPN with proper User authentication on a tunnelling technology that doesn’t understand things like user or even server and client? By combining Wireguard with a good firewall and some form of authentication and user management.

The basic idea is to have the firewall on the “server” configured so users are only able to see a web interface on the “server” over the tunnel until they have logged in to that web interface. Once they have authenticated, the firewall on the “server” then expands the rules to allow the user to see the resources they are allowed to access over the VPN.

Some basics on the Wireguard level

The Wireguard internal network is a relatively large RFC1918 address space, sized according to the number of users expected, so maybe a /22 if you expect to have under 1024 users, or a /21 if you expect to have under 2048 users. Let’s say you decide to use a chunk of the 10.0.0.0/8 private network, you could maybe assign 10.100.0.0/21 for the VPN. This would give your VPN server 10.100.0.1/21 and each of your users a static IP in the range from 10.100.0.2 to 10.100.7.254.

The VPN “server” would have a [Peer] entry for each user device, with the assigned IP as the AllowedIPs, so the first user device would have AllowedIPs = 10.100.0.2/32 in the [Peer] entry on the server.

Each user device would have their assigned IP address in the [Interface] section of their Wireguard config, so the first user’s device would have Address = 10.100.0.2/21. The user device will only have one [Peer] entry in the Wireguard config with the VPN “server” and the IP ranges of any resources the user should have access to. So assuming the first user should be able to see web sites on a private network with range 10.0.0.0/24 over the VPN, their AllowedIPs would look like AllowedIPs = 10.100.0.1/32, 10.0.0.0/24.

Some basics on the firewall level

Essentially, the firewall should allow any IP in the 10.100.0.0/21 range to see the web interface on the VPN “server” before being authenticated, so fairly high up in the firewall rule set you would have something like:

iptables -A INPUT -i wg-vpn -s 10.100.0.0/21 -p tcp --dport 443 -j ACCEPT

Most of the fancy stuff happens in the FORWARD firewall rules, the way we arrange it is to have a table for each user (so for example a table called u_user1) with the rules defining what the user can access. The main FORWARD table then has a bunch of rules along the line of iptables -A FORWARD -s 10.0.0.2/32 -j u_user1 in it. The system arranges for the first rule in the user table to be a DROP drop rule until the user authenticates.

Just in case it actually needs to be said, always (always) change the default policy of your INPUT and FORWARD tables to DROP once you have a working ruleset.

So looking at user 1, their user table would look like this before they authenticate:

-j DROP
-d 10.0.0.0/24 -p tcp --dport 80 -j ACCEPT
-d 10.0.0.0/24 -p tcp --dport 443 -j ACCEPT

Once they authenticate, the system runs something like iptables -D u_user1 1 and it changes to this:

-d 10.0.0.0/24 -p tcp --dport 80 -j ACCEPT
-d 10.0.0.0/24 -p tcp --dport 443 -j ACCEPT

And once they log out or their connection times out, that DROP rule at the start of the table is put back in again by running something like iptables -I u_user1 1 -j DROP.

Of course if user 1 is also the server administrator for the VPN server, you would also have a rule -d 10.100.0.1/32 -p tcp --dport 22 -j ACCEPT in the u_user1 table and a -s 10.100.0.2/32 -j u_user1 in the INPUT table. That way user 1 can SSH to the VPN server once they have logged in to the VPN.

Some not-so basics on the authentication layer

This is where a lot of moving parts come in.

User interface

In my implementation, I have a PHP driven web site sitting on port 443 and bound to IP address 10.100.0.1 on the VPN server. There is also a DNS A record (say vpnauth.example.com) that points to 10.100.0.1 and a Let’s Encrypt SSL certificate (using DNS-01 verification, obviously, because the Let’s Encrypt HTTP-01 verification system can’t see 10.100.0.1), so users will browse to https://vpnauth.example.com/ once their Wireguard tunnel is enabled to log in to the VPN.

In my implementation, this login process also involves a Time-Based One-Time Password system for 2-factor authentication.

Keep admin access to a minimum, even in your own code

At the core of the system is a message queue system, a message consumer with root access to change iptables rules and the Wireguard configuration, and a bunch of timed scripts in addition to the web site users log in to.

The main reason for this is to keep the amount of code that has root access to the bare minimum.

Login process

When a user logs in to the web interface, the web site does some checks first. It checks that the user is logging in from their assigned VPN IP address and it checks that the user’s password hasn’t expired. If both these checks pass, it marks the user as logged in and then places a message on the queue to enable the user’s access in the firewall.

If the user is trying to log in from an IP that is not their assigned one, an alert message is sent to the Security team and the user is logged out. Remember that this site is only visible when connected to the Wireguard network, so if a user is trying to log in from an IP that is not assigned to them it means there’s a mismatch between the Wireguard configuration on the device and the person trying to log in to the VPN from that device.

If the user’s password has expired, they are redirected to the password change page instead of being marked as logged in an a message being put on the queue to enable the user’s access in the firewall.

Logging out

When a user clicks the button to log out, the web site marks them as logged out and places a message on the queue to disable access in the firewall.

Because users very seldom actually click the button to log out, we have a timed script that runs every 60 seconds or so. This script looks at when last a data packet was seen in Wireguard from each logged in user’s device (each user device has a PersistentKeepAlive parameter set in Wireguard). If the last packet seen from a logged in user device was more than 5 minutes ago, we assume they disconnected from the Internet or disabled the tunnel in Wireguard and they are logged out as if they clicked the button to log out.

Also, if their last login was more than a configurable number of hours ago the same thing happens, this is to avoid users logging in and then leaving their devices connected to the VPN for days at a time.

Expiring accounts

We have a way to set an expiry date and time on a user account, mainly for 3rd party service providers who need temporary access to systems they support and to schedule automatic expiry of the VPN account when a user is off-boarded (our off-boarding process is pretty involved and starts well before the user actually departs, this timed expiry allows us to set-and-forget the process of turning off their VPN access before the actual scheduled departure time).

To handle this, we have a timed script that runs once per minute and looks for enabled accounts that have an expiry time in the past. These accounts are then marked as disabled, if they are logged in they are logged out, and a message is put in the queue to remove their [Peer] entry from the Wireguard configuration.

Another timed script looks for accounts that have not been logged in to in a specific (configurable) number of days and disables them in the same way as expired accounts. It’s a good idea to automatically disable VPN accounts that have not been used in a while (but make it more than a month, sometimes there are people who only need to access the VPN once a month for stuff like month-end tasks).

Expiring passwords

We also, because it’s a good idea regardless of what NIST says, have a password expiry system. We have a timed script that runs once a day, and the end of the business day, that checks for passwords expiring in the next X number of days and emails those users to remind them that their VPN password is about to expire and they need to change it.

Next it looks for passwords that expired today and emails those users to tell them their VPN password has expired and now they absolutely have to change their password before the VPN server will let them see anything other than the password change page.

It’s always good to do sanity checks

There is a final timed script that runs at a configurable interval and does sanity checking by comparing what the database says with the running firewall and Wireguard configuration and fixes any discrepancies. This process is there as a safety net in case the message queue consumer misses a message.

Because Security Is Important, this sanity check always errs on the side of marking users as logged out and disabling their access in the firewall if the database and the running configuration disagree on whether a user is logged in.