zach walton

1.5M ratings
277k ratings

See, that’s what the app is perfect for.

Sounds perfect Wahhhh, I don’t wanna

Home WAN Failover with UDM Pro

If recurring cloud service fees for hobby projects cause death by a thousand cuts, I’m feeling like I did the first few times Ashina Elite - Jinsuke Saze kicked my ass in Sekiro:

…but my SRE day job long ago scared me off single points of failure, so I’ve been hesitant to host services that real users depend on–mostly https://life4ddr.com and https://truebpm.dance–at home.

But then we moved to our new home, where the prior owners had installed a couple of 13.5 kWh Tesla Powerwalls in the garage, and I realized I had power redundancy for the first time. This got me thinking about what it would take to build a sufficiently HA environment at home for hosting community projects and, well, there’s work left to do to answer that question.

Might as well start with…

Redundant ISPs!

Step 1: ISPx2

Step 1 is easy: We pay $80/mo for AT&T Gigabit fiber as our primary ISP. I’ve had a (deactivated) Starlink RV dish and have been waiting for the chance to use it.

The primary downsides of Starlink for RVs are a) the slightly higher cost than residential Starlink ($135/mo vs. $110/mo for residential), and b) RV users are throttled during periods of high demand. In practice, this has never happened to me; I’m on the waitlist for a residential plan.

Starlink is fairly fast:

image

And the latency to google.com is eehhh, ok:

image

So why not another fiber or broadband provider?

  1. I’ve been enjoying not paying Comcast too much
  2. The Starlink dish was sitting in my closet
  3. They’re hobby projects, not payment processors that need 5+ 9s of availability and sub 10ms latency
  4. (the primary line kinda never goes down anyway)
  5. Off the griiiiidddd
  6. etc.

Installation is a cinch if you cut corners:

Put it on the roof

image

Do a great job running and hiding the cables, definitely no eyesores here (not pictured)

(sorry to devon)

Through the wall and into the router

First time using this stuff to seal the hole drilled for the cable, plus some Sikaflex concrete sealant since, unlike regular silicone caulk, it can be painted once dried.

There may be an embarrassing part omitted here involving drilling into an “electrical wire”, panicking, and in the end discovering that it was chicken wire & part of the stucco…

image

Not pictured: cable grommet for the 1" hole that has not yet arrived from Amazon

image

“That’s a giant hole”

Yeah, because of this thing :(

image

The official routing kit comes with a ¾" drillbit. I used the 1" drillbit I had on hand.

Final result:

image

Step 2: Automated failover

At this point, I had a separate SSID that I could manually switch devices to, but I didn’t want to have to deal with doing that manually when I’m away from home. And I might not always be available, and don’t want users waiting until I am…

Enter the Ubiquiti Dream Machine (UDM) Pro!

Ok, so I really just copied a friend here without doing a ton of research:

image

But it ticks the boxes:

  • WAN failover (WAN load balancing not supported… but we can handle a few seconds of downtime)
  • Remote management interface
  • …That’s it?

It does way more than this, but my goals are not lofty.

Once it arrived, it only took a few minutes for initial setup:

  • Plugged primary modem into WAN port 2, SFP+ 10gb (with an RJ-45 adapter)
  • Plugged Starlink router into WAN port 1 (with a Starlink ethernet adapter)
  • Turned on the UDM Pro, paired via bluetooth and finished guided setup through the iOS app

I then changed the port configuration to make WAN 2 primary and WAN 1 secondary. In my head this felt like a step toward >1Gb home Internet (AT&T offers 5 Gb today 😱). In practice, this led to a lot of packet loss and continual failovers to the backup link; it’s probably an issue with the adapter or cable somewhere in the chain but i didn’t feel like figuring it out.

I then discovered that you can configure port 8 as the secondary WAN link, and shuffled connections around so both WANs were cabled without the need for an adapter. This fixed the failover flapping and packet loss.

Somewhere along the way I also changed the echo server from the default Ubiquiti server to Google DNS (8.8.8.8). Ubiquiti pings the echo server to make the decision to fail over or not. I’m not sure this contributed to solving my issues, but it has for some

Success!

image

And kind of a sick name…

I didn’t have to simulate failover because I broke things plenty of times during the setup process:

image

In practice, I saw 5-10 seconds of packet loss before Starlink took over. And automated recovery when AT&T started pinging again! More than adequate for users of a DDR BPM calculator.

What’s left?

Ok, redundant power + Internet. But how do users get to a server deployed in my house when the IP changes on failover? Dynamic DNS? Do I go full Brad Fitzpatrick and shell out $12k+ on a /24 of IPv4 space, and create my own ASN + anycast from my house? (probably not)

Am I going to deploy CRITICAL DANCE GAME SERVICES on a SINGLE DESKTOP MACHINE!? (obviously not)

I don’t know. Hopefully we will find out together in a subsequent post.

sre distributed systems networking hacker news sysadmin

gRPC stalls? check your http/2 window sizes!

the empty page is leering at me so i feel compelled to share a useful tip in lieu of time (for now) to write a full post!

if you see stalls at high throughput in a golang gRPC service, it could be because the default http/2 window sizes are insufficient. you can increase them substantially (i use 32 MiB); just keep in mind that memory usage will increase to window size bytes * num_connections:

make sure to configure this for both the client and server:

stay tuned for a deeper analysis of this problem, and tips for debugging a gRPC server in a stalled state.