Some observations while implementing nginx-ultimate-bot-blocker

I run two small web sites, and I don’t like AI stealing my stuff.

I recently spotted a post by Mastodon@uastronomer pointing out the existence of nginx-ultimate-bad-bot-blocker that goes some way toward blocking a lot of misbehaving stuff right in the nginx configuration.

After implementing it on my tiny web server, I thought I’d have a bit of a dig through my nginx access_log to see if stuff is getting blocked by it. While doing this I was reminded of a couple of common sense tips.

Tip 1 - Keep it simple

Specifically, try to avoid stuff that “makes it easy” by putting really complicated stuff on your web server. In this case I’m specifically referring to WordPress.

This thing is really complex, and (with the large number of addons available for it) a big potential security treat if not very carefully secured and maintained. WordPress is very commonly probed for on all web sites. I’m seeing many attempts to access stuff in /wp-admin/ on my sites. This stuff is pop-u-lar, some of them doing sneaky stuff like passing a faked Referrer header even, the vast majority of them using the GRequests User-Agent header.

Note I’m not saying block the GRequests User-Agent, it’s just an HTTP requests library written in Go, but it does seem to be pretty popular with the stuff trying to break into the non-existent WordPress on my server.

Tip 2 - If you must run WordPress, add security layers

If you absolutely can’t get away with not running WordPress, please do something about adding extra safety measures to your /wp-admin/ path. Something I did back when I ran WordPress was proxy the site through Cloudflare and put the /wp-admin/ bedind a Zero Trust Access login. That way I had to log in to Access before I could even see the WordPress admin interface login page.

There are other things you could do too, if you don’t use something like Cloudflare, but that’s beyond the scope of this article.

Tip 3 - Keep it even simpler

If at all possible, don’t even install PHP on your web server. I recently posted about how I made this blog static HTML, completely removing the need to have PHP on my web server at all.

So why this, you say? I see very many attempts to access all manner of random (and some not so random, looking at you /xmlrpc.php and /.env.php) PHP files on my sites. Because my web server doesn’t have PHP on it at all, all those requests get a 404 response.

Tip 4 - For all that is fluffy, keep git out of your web root

Using git to deploy updates to your web server is pretty cool and can be very useful, but put some though into it first.

Don’t put your public content in the root of the directory structure you have in git, that puts the .git directory right inside your web server’s document root. Look at some of the (many) paths that things are trying to access on my sites:

  • /.git/config
  • /.git/credentials
  • /ci/.git/config
  • /deploy/.git/config
  • /src/.git/config

Take a hint from me, put your public content in a sub-directory and point your web root at that. For example, the Static Site Generator I use generates the site in a directory called public, so that’s where I point root in the nginx config.

Even if you are literally editing static HTML and CSS, put it in a directory like public, don’t put the stuff in the same directory that contains your .git directory.

Tip 5 - Block l9explore

(Yes, this is whack-a-mole, I know, but I think this one is new-ish)

If you choose to implement nginx-ultimate-bad-blocker, I suggest adding this to the bottom of /etc/nginx/bots.d/blacklist-user-agents.conf:

"~*(?:\b)l9explore(?:\b)" 3;

Reason being, close to 100% of the requests I see looking for stuff in .git directories have a User-Agent of l9explore/1.2.2