5 mins read

That One-Line OpenSSH Bug Just Ruined My Weekend

I spent my entire Saturday morning staring at a wall of failing SSH connections. Our staging cluster with 3 nodes just stopped accepting new sessions. Existing connections were fine. New ones? Dropped instantly.

I dug into the syslog and found the SSH daemon spawning child processes that died milliseconds later, flooding the logs with segfaults. Someone was scanning our IP block, and they happened to trigger the recent OpenSSH GSSAPI vulnerability.

And the cause? A single line of code. An autocomplete error, basically.

The Autocomplete Error That Took Down Servers

To understand why this broke so spectacularly, you have to look at the C code inside kexgsss.c. This file handles the server-side GSSAPI key exchange.

When the server receives a malformed or unexpected packet during this exchange, it’s supposed to drop the connection and kill the child process handling that specific user session. The developer intended to call a function named ssh_packet_disconnect().

Instead, they called sshpkt_disconnect().

Notice the missing underscore and the abbreviated “pkt”? That tiny difference is everything.

The sshpkt_disconnect() function is non-terminating. All it does is construct a disconnect message in memory. It doesn’t actually send it, and more importantly, it doesn’t terminate the process.

server room data center - Server Room vs Data Center: Which is Best for Your Business?
server room data center – Server Room vs Data Center: Which is Best for Your Business?
/* The vulnerable code looked roughly like this */
if (error_condition) {
    /* WRONG: Just builds a packet, doesn't exit! */
    sshpkt_disconnect(ssh, "Invalid GSSAPI token");
    /* Execution continues into undefined behavior... */
}

/* The fix */
if (error_condition) {
    /* CORRECT: Builds packet, sends it, and terminates process */
    ssh_packet_disconnect(ssh, "Invalid GSSAPI token");
}

When I tested this locally on an Ubuntu 22.04 VM running OpenSSH 9.6p1, the results were entirely predictable. Send one garbage packet during the gss-group1-sha1-* key exchange, and the pre-auth child process dies.

The Privilege Separation Nightmare

Crashing a process is annoying. A Denial of Service (DoS) attack is bad. But the security community is sweating over this for a different reason.

OpenSSH uses a strict privilege separation model. When you connect, the main root-owned sshd daemon forks an unprivileged child process to handle the dangerous stuff—parsing network data, negotiating cryptography, and handling the key exchange.

If an attacker can manipulate the state machine of that child process by forcing it to continue execution when it should have died, they might be able to do more than just crash it. They are operating in an unintended state. Memory corruption vulnerabilities love unintended states.

The Configuration Gotcha I Fell For

When I first read about the bug, I checked our Ansible playbooks. We enforce GSSAPIAuthentication no across all our environments. I leaned back in my chair, thinking we were safe. We don’t use Kerberos. We don’t use GSSAPI auth.

I was completely wrong.

The vulnerability happens during the Key Exchange (KEX) phase. This is pre-authentication. The server and client are just trying to agree on how to encrypt the session before the server even asks who you are or what your password is.

Setting GSSAPIAuthentication no only stops the user from authenticating with GSSAPI. It does absolutely nothing to stop the server from advertising and accepting GSSAPI Key Exchange algorithms. As mentioned in DoH is a Nightmare for Linux Server Security, sometimes server configurations can be tricky.

server room data center - Data center and server room considerations: What you need to know ...
server room data center – Data center and server room considerations: What you need to know …

How to Actually Fix It

The obvious answer is to update your packages. The distros have been pushing patches aggressively. Run your apt-get update or dnf update and restart the SSH daemon.

But sometimes you can’t just restart SSH on a production box in the middle of the day. Or maybe you’re stuck on an older appliance where you don’t control the patch cycle.

In that case, you need to strip the GSSAPI algorithms out of your allowed KEX list, similar to the approach described in Surviving the AlmaLinux OpenSSL update disaster.

# Add this to /etc/ssh/sshd_config
# This explicitly defines allowed KEX algorithms, omitting all GSSAPI variants

KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256

After adding that, run sshd -t to verify the syntax, and then reload the daemon. The server will no longer advertise the vulnerable key exchange methods, and the attacker’s packets will be rejected before they ever reach the buggy C code.

Why Does This Keep Happening?

This whole mess reignites a long-standing argument in the Linux community.

Upstream OpenSSH—the actual developers who maintain the project—have refused to merge the GSSAPI key exchange patch for years. They argue it’s too complex, adds unnecessary attack surface, and mostly benefits a small subset of enterprise users heavily invested in Active Directory and Kerberos.

But enterprise customers pay the bills for companies like Red Hat and Canonical. So, the distribution maintainers maintain this massive out-of-tree patch and apply it to their OpenSSH packages before shipping them to you.

I’m heavily considering compiling OpenSSH from source for our edge nodes moving forward, similar to the approach outlined in Building openSUSE Packages Just Got Less Annoying.

Go check your KexAlgorithms. Seriously. Do it before you log off today.

Leave a Reply

Your email address will not be published. Required fields are marked *