HAProxy 'home' Load Balancer
Introduction:
I've done a few posts in the past about using nginx as a reverse proxy / loadbalancer, however I thought I'd look into HAProxy as a possible alternative to some of the issues I was facing. Such issues include:
- nginx failing to start if downstream services are not online.
- Site configuration is tricky, it's spread out between lots of files (1 per site).
- LetsEncrypt integration - I was using a docker images provided by linuxserver.io (A fantastic site), but the container did a lot more than I really wanted it to. There will be another post on my LetsEncrypt setup after this one.
- I wanted some sort of status page for internal services
While this isn't exactly an extensive list, and I may have been able to solve some of these without switching to HAProxy, I like keeping my knowledge up in these various areas.
Existing Setup:
It makes little sense to explain a new solution without first explaining the old one.
First and foremost, I am not actually load balancing. I am routing traffic to different servers based on an incoming domain name, why do you say?
Well, as a home environment I have a single IP address. I run multiple services (web based). How do you share the single port 80 / 443 available to these services between all of the different machines?
I could put all of these services on a single webserver (or most of them), but this is painful and requires a complete re-work of a lot of things I run. Many things which are running are individual docker images and run their own instance of a webserver, so integrating these together would create a single large and complex beast which I don't want to have to manage.
So, nginx was to the rescue! By using nginx in reverse proxy mode I can use multiple domain names (blog.dchidell.com, site-a.dchidell.com, site-b.dchidell.com etc) and then based on these domains I can send those requests off to their corresponding webserver running as docker containers. I've been doing this for a while.
Recently (about 6 months ago) I switched to the above mentioned LetsEncrypt linuxserver.io docker image as a base. This allowed me to terminate SSL on nginx and provide HTTPS access to my externally facing sites by only having a single certificate on a single machine. This is excellent, it really saves management and to be honest I wouldn't have bothered to use HTTPS if it hadn't been that easy.
New Setup:
I've now de-coupled the LetsEncrypt certbot from the loadbalancer things get much simpler.
Here's the docker-compose configuration I am using within my 'master' docker-compose file:
haproxy_loadbalancer:
image: haproxy:alpine
restart: always
ports:
- '443:443'
- '80:80'
volumes:
- /real-configuration/path/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
- /real-configuration/path/live/dchidell.com/haproxy.pem:/etc/haproxy/haproxy.pem:ro
Nice and simple! We have our configuration file as well as the .pem file which is our certificate for HTTPS. This is build using the certbot container which can be found in a later post.
Now the HAProxy configuration is a little more complex, and I've removed quite a lot of my actual config as it has a lot of duplicated material, and this is enough to show the various constructs and concepts used:
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
ssl-default-bind-options no-sslv3 no-tls-tickets
ssl-default-bind-ciphers !EDH:!RC4:!ADH:!DSS:HIGH:+AES128:+AES256-SHA256:+AES128-SHA256:+SHA:!3DES:!aNULL:!eNULL
tune.ssl.default-dh-param 2048
defaults
log global
mode http
option httplog
option dontlognull
option forwardfor
option http-server-close
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:80
mode http
# Internal Hosts
acl acl_internal_1 hdr(host) -i internal-site-1.dchidell.com
<removed for brevity>
acl acl_local_subnet src 10.0.0.0/8
# Redirect all others
redirect scheme https code 301 if !acl_local_subnet
redirect scheme https code 301 if !acl_internal_1 <+ all removed ACL entries> !{ ssl_fc }
# Internal Redirects
use_backend internal_1_server if acl_internal_1
<removed for brevity>
default_backend blog_server
frontend https-in
reqadd X-Forwarded-Proto:\ https
rspadd Strict-Transport-Security:\ max-age=31536000;\ includeSubDomains
rspadd X-Frame-Option:\ DENY
bind *:443 ssl crt /etc/haproxy/haproxy.pem
# Define hosts
acl acl_blog hdr(host) -i blog.dchidell.com
acl acl_external_1 hdr(host) -i external-1.dchidell.com
acl acl_redirect hdr(host) -i redirect-server.dchidell.com
acl acl_letsencrypt path_beg /.well-known/acme-challenge/
## figure out which one to use
http-request redirect location https://redirected-location.example.com if acl_redirect
use_backend web_server if acl_letsencrypt
use_backend external_1_server if acl_files
default_backend blog_server
backend web_server
server webserver webserver:80 check
backend blog_server
option httpchk
server ghost ghost:2368 check
Let's break this configuration down into chunks so we can more easily understand what each section is doing.
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
ssl-default-bind-options no-sslv3 no-tls-tickets
ssl-default-bind-ciphers !EDH:!RC4:!ADH:!DSS:HIGH:+AES128:+AES256-SHA256:+AES128-SHA256:+SHA:!3DES:!aNULL:!eNULL
tune.ssl.default-dh-param 2048
I don't claim to be an expert on HAProxy by any means - most of this configuration I have taken from examples found online. For reference the configuration guide for HAProxy should be consulted: https://cbonte.github.io/haproxy-dconv/1.7/configuration.html I am simply providing an example of a working configuration within my setup.
Firstly we've got some logging options. This was actually in the configuration defaults and I'll probably change this at some point. I don't really need any logging but I do run a splunk instance (which I don't look at too often). So I'll likely look into that later.
The real bulk in here is the SSL parameters. I've taken these from the 'best practice' resources I could find online. I've since run an SSL test using the following tool: https://www.ssllabs.com/ssltest/ which gives me an A+ rating so I'll take it that they're pretty good!
defaults
log global
mode http
option httplog
option dontlognull
option forwardfor
option http-server-close
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
Here we've got a few options which just make specific changes to the way forwarding is performed. There's also timeout values here, I noticed that if no timeout is specified the configuration complains with some warnings.
frontend http-in
bind *:80
mode http
# Internal Hosts
acl acl_internal_1 hdr(host) -i internal-site-1.dchidell.com
<removed for brevity>
acl acl_local_subnet src 10.0.0.0/8
# Redirect all others
redirect scheme https code 301 if !acl_local_subnet
redirect scheme https code 301 if !acl_internal_1 <+ all removed ACL entries> !{ ssl_fc }
# Internal Redirects
use_backend internal_1_server if acl_internal_1
<removed for brevity>
default_backend blog_server
Here's some of the real meat - and this section I understand better! HAProxy works using a concept of 'frontend' and 'backend'. Frontend deals with anything that faces the client and backend deals with configuration associated with servers (surprise surprise, it's all rather logical!)
acl acl_internal_1 hdr(host) -i internal-site-1.dchidell.com
This is the ACL we use to redirect clients. You'll need one of these for each internal-site you have. External sites are covered in the HTTPS section since we're going to redirect those to HTTPS first. I've removed most of my entries for privacy reasons and there's quite a lot of them. This particular ACL matches incoming host of 'internal-site-1.dchidell.com'. Any incoming request which has been made by visiting that URL will be matched by that ACL.
acl acl_local_subnet src 10.0.0.0/8
This defines my local subnet, which is important for preventing external access to internal sites. For example, I have an internal-only site running on internal-1.dchidell.com. There is no external DNS record for that URL. If someone is guessing for internal sites they could potentially send a HTTP request towards my IP directly (or any other external subdomain) and simply modify the 'Host' field within the HTTP header. The loadbalancer would then direct them to the internal site, which may pose a security risk.
redirect scheme https code 301 if !acl_local_subnet
This ties into the previous section. If anything matches an external subnet we direct it immediately to HTTPS. If it's made on an internal only host, it will simply fall towards the default_backend defined in the https-in frontend section.
redirect scheme https code 301 if !acl_internal_1 <+ all removed ACL entries> !{ ssl_fc }
This does the same thing as above, but checks against the previously created ACLs - essentially anything that doesn't match internal ACLs needs to go to HTTPS and processed for external sites. Each ACL entry you create has to go in here, so the line can get pretty long. Alternatively you could use multiple redirect lines for each ACL.
use_backend internal_1_server if acl_internal_1
This is the really interesting line, if we match the 'acl_internal_1' acl we'll use the backend servers defined in 'internal_1_server'. This represents the core functionality of our load balancer - to direct requests aimed at certain domain names to multiple different backend services. One entry of these will exist for each site we have, essentially a 1:1 mapping of 'use_backend' statements to ACL entries.
default_backend blog_server
This simply says that if we don't match any other URLs we divert to a default backend.
Now onto the HTTPS section! It's pretty similar...
frontend https-in
reqadd X-Forwarded-Proto:\ https
rspadd Strict-Transport-Security:\ max-age=31536000;\ includeSubDomains
rspadd X-Frame-Option:\ DENY
bind *:443 ssl crt /etc/haproxy/haproxy.pem
# Define hosts
acl acl_blog hdr(host) -i blog.dchidell.com
acl acl_external_1 hdr(host) -i external-1.dchidell.com
acl acl_redirect hdr(host) -i redirect-server.dchidell.com
acl acl_letsencrypt path_beg /.well-known/acme-challenge/
## figure out which one to use
http-request redirect location https://redirected-location.example.com if acl_redirect
use_backend web_server if acl_letsencrypt
use_backend external_1_server if acl_files
default_backend blog_server
So first we have a few options, which I'll highlight in bulk:
reqadd X-Forwarded-Proto:\ https
rspadd Strict-Transport-Security:\ max-age=31536000;\ includeSubDomains
rspadd X-Frame-Option:\ DENY
bind *:443 ssl crt /etc/haproxy/haproxy.pem
The particularly important one I had to add myself was this:
reqadd X-Forwarded-Proto:\ https
This caused a wordpress site to function correctly. Without this some items were treated as HTTP (I don't really know why, I didn't bother to look into it) and the site gave security warnings since some of the content was not encrypted.
The final 'bind' option is also important, this is where we specify the location of the SSL certificate & private key within a single file. You can see that this file path corresponds to the above docker-compose.yml file which has been passed that filepath so a valid cert can be used.
Next we have a few ACLs similar to the HTTP section above. We have a few more here, and these ACLs should be for external sites. We can also set up 30x redirects instead of backend servers, so there's an ACL here for that. Syntactically it's the same as the others, but in the next couple of sections we'll see how it's used differently.
acl acl_letsencrypt path_beg /.well-known/acme-challenge/
This ACL is particularly special. This allows the LetsEncrypt agent to retrieve ownership confirmation of domains by routing these requests to a special backend server based on the URL requested. My certbot setup uses a web directory and places files in this location which are then served by a static webserver I have in place for miscellaneous bits and pieces.
http-request redirect location https://redirected-location.example.com if acl_redirect
This is the 30x redirect I was speaking about earlier. This is used for some sites I 'host' but simply redirect to other places entirely.
use_backend web_server if acl_letsencrypt
use_backend external_1_server if acl_files
default_backend blog_server
We're already familiar with the functionality of these 3 lines from the HTTP section - this is where we now actually send the clients to their respective backends but for HTTPS. These sites are external sites only, internal is handled in the HTTP section.
Now for the backend sections we've seen reference to, but not actually used:
backend web_server
server webserver webserver:80 check
backend blog_server
option httpchk
server ghost ghost:2368 check
This is where those backends are defined. Multiple 'server' lines can be used and loadbalancing to those servers will occur when requests are made to those backends. But as I said initially, I'm not actually doing loadbalancing so I only have a single server defined in each section. I do like the monitoring abilities however so like to add the check option which will allow HAProxy to poll the status of the servers. The webserver entry does not have 'option httpchk' as it will return a 404 error and be marked as offline. TCP in this case is fine. I could specify a path if I wanted but then I might have to change this config later on and I'd likely forget by then.
So, some of you may have looked at this and thought - well that's all fine, but why are you using HTTP only for internal sites and HTTPS for the external ones? Surely it would be more secure (and easier config wise) to simply punt everything to HTTPS and process it there?!
I have a perfectly good explanation! SSL certificates are generated by LetsEncrypt. In order to generate a certificate for a domain you must prove you own the subdomain, which requires an external DNS record in place. If I have an internal DNS server (which I do) and do not wish for a website to be externally facing, I do not create an external DNS record for it. This means that LetsEncrypt will not be able to resolve any internal DNS entries and will therefore fail to validate the domain. If I push these sites to HTTPS without such validation I will get certificate errors in my browser! I could probably solve this using some sort of internal CA but I really don't want to go there, and HTTP is fine for my setup.
Final Bits:
Since I run this on docker, I thought I would share a few things I noticed.
You can use the following syntax to test your HAProxy configuration:
docker run -it --rm -v /path/to/config:/usr/local/etc/haproxy/haproxy.cfg haproxy:alpine haproxy -c -f /usr/local/etc/haproxy/haproxy.cfg
That's assuming of course you're happy to use the haproxy:alpine image!
You can also reload the HAProxy configuration by issuing a SIGHUP to the container, this won't restart the container (but it will clear the stats if you use the stats page).
Issue the sighup using the following command:
docker kill -s HUP haproxy_container_name
I mentioned in the introduction that I wanted some sort of status page, well HAProxy has this functionality built in! I didn't cover it above because it's not directly related to the loadbalancing site and it's really easy to enable.
Simple add the following 3 lines into the defaults section:
defaults
stats enable
stats auth username:password
stats uri /mystatsurl
You'll want to modify the auth section to use a username and password of your choosing as well as the uri section for a URL. Then you can simply hit the loadbalancer on one of its running protocols and it'll serve a live copy of that file. The statistics will be lost after a restart / reload of HAProxy. If you want something more in depth refer to the configuration guide.