02/04/2022 Outage

There was an extended outage from approximately 01:36 AM EST on 02/04 to approximately 8 hours later, and subsequent connection interruptions that degraded performance.

This was a regrettable series of events that arose from a firewall misconfiguration; in the interest of security, I enabled a default-deny rule. As the server was running for well over a month, I assumed there had been no problem. When I paid for a server upgrade, the server subsequently restarted; the rule then came into effect after a reboot. 

After struggling with support for hours, I was able to determine the root cause.

What will change in the future

  • I will consider the state of the outage. If the hosting provider reports a server as Online, but is unreachable, I will look to the network, and not assume the machine is dysfunctional.
  • I will use the VNC system when SSH does not work, as it does not rely on a network connection to the individual server, but rather the server host.
  • I will set alerts that trigger an audible alarm at more safe hours. 1:30 AM would probably not be included, definitely not on a Saturday, but definitely alert me by the morning.
  • I will add a month to all subscriptions. I do not know yet how I will do this. I've contacted Stripe about skipping a billing cycle for monthly subscribers. (However, there is no automatic ending of service in the case of cancellation.)
    • I have determined that there is no way to “add” a month to an ongoing subscription; only stopping an active subscription, which would interrupt the ongoing subscription. I will instead double the storage capacity of each user.
  • I will block all services from binding to 0.0.0.0, so that a firewall is not necessary (only HTTP/SSH ports would be listening).
  • I will find out why ufw blocks all connections when explicitly allowing certain ports.
  • I will research more options to migrate off of my current provider. They have been stellar so far, but, unfortunately, it would be preferable to use a hoster with a more complete set of features. For example, GCP/AWS allow attaching a disk to a brand-new machine; that would allow me to have created a new server within minutes and switched over.