Stop Overcomplicating Your Linux CI/CD Pipelines

My pipeline failed for the seventh time on a Tuesday night. Actually, I should clarify — I was staring at a wall of red text in my terminal, trying to figure out why a supposedly simple deployment was timing out. We had Kubeflow operators, three different Ansible roles, and a massive Docker container all fighting for resources on a single runner.

It was a mess. And the industry has developed a severe YAML addiction over the last couple of years. We keep stacking abstraction on top of abstraction until nobody actually knows what’s happening when they push code to the main branch.

I wiped the configuration and started over.

The Hidden Cost of Heavy Tooling

There is a massive push right now to integrate heavy machine learning workflows directly into standard deployment pipelines. Probably everyone wants Kubeflow managing their model deployments right next to their standard web apps. I tried this approach on a t3.xlarge EC2 instance running GitLab Runner 16.8.

It choked. Hard.

Docker logo – Docker Announces Docker Extensions and Docker Desktop for Linux at …

The runner kept running out of memory during the container build phase. The overhead of spinning up the necessary execution environments just to run a few validation scripts was completely eating our resource limits. I watched our memory usage spike to 98% before the kernel OOM killer stepped in and slaughtered the process.

You don’t always need a distributed machine learning platform to move files around. Sometimes you just need basic Linux utilities.

Falling Back to CLI Primitives

I dumped the heavy container approach. Instead of pulling down a 400MB custom Python image just to parse our deployment manifests and trigger downstream jobs, I switched to a lightweight Alpine image and used jq. jq is a lightweight and flexible command-line JSON processor that allowed me to extract the necessary information without the overhead of a full Python runtime.

But people forget how fast standard GNU tools actually are.

#!/bin/bash
# Extract image tags and trigger pulls without a heavy runtime
cat manifest.json | jq -r '.deployments[].image' | while read image; do
    echo "Pre-fetching $image..."
    docker pull "$image" --quiet &
done
wait
echo "All images cached."

That single change dropped our memory usage by 62%. More importantly, it cut our pipeline execution time from 14m 20s down to 3m 45s. No custom operators. No massive dependencies. Just basic shell piping.

The Ansible CI Socket Gotcha

Docker logo – How Docker works and what are its secrets | Cooltechzone

Ansible is still my default for actual configuration management, but running it inside an ephemeral CI container introduces weird edge cases. I hit a brutal one last week while testing against some new Ubuntu 24.04 LTS nodes.

My playbooks ran perfectly from my laptop. Inside the pipeline, they failed with cryptic SSH connection timeouts. — I added -vvvv to the Ansible command and noticed the SSH multiplexing socket was failing to create.

The Linux kernel has a hard limit of 108 characters for Unix domain socket paths. My CI runner was generating a workspace directory path that was 112 characters long. When Ansible Core 2.16.3 tried to create the ControlPath for SSH multiplexing in that directory, the OS silently truncated the filename. The connection just vanished into the void.

If you run Ansible in CI, you need to override the default SSH path to something short and absolute.

Docker logo – A Step-by-Step Guide To Deploy Docker Containers On AWS EC2 …

# ansible.cfg
[ssh_connection]
# Force a short path for CI runners to avoid the 108-char socket limit
control_path = /tmp/ansible-ssh-%%h-%%p-%%r
pipelining = True

I pushed that config change and the pipeline immediately went green.

What Actually Works

I am actively ripping complex orchestration out of our standard deployment paths. If a task can be accomplished with awk, sed, or a basic bash script, that is what I write. I save the heavy platforms for the actual infrastructure they were designed to manage, not the pipeline itself.

By late 2027, I expect we will see a massive backlash against overly complex CI/CD platforms. Teams are getting tired of debugging 500-line YAML files just to copy a binary to a server. We are going to see a hard pivot back to glorified shell scripts and simple, single-binary runners. Simplicity in CI/CD pipelines is becoming more valued as teams struggle with the overhead of complex tools.

Stop blindly copying massive pipeline templates from documentation sites. Start with a bare shell, add tools only when it physically hurts not to have them, and keep your paths short.

FAQ

Why does Ansible fail with SSH timeouts inside CI runners?

Ansible fails because the Linux kernel enforces a 108-character limit on Unix domain socket paths. CI runner workspace directories often exceed this, so when Ansible tries to create the ControlPath for SSH multiplexing, the OS silently truncates the filename and the connection disappears. Fix it by setting control_path to a short absolute path like /tmp/ansible-ssh-%h-%p-%r in ansible.cfg.

How much faster is jq than a Python container for parsing deployment manifests?

Swapping a 400MB custom Python image for a lightweight Alpine image using jq cut pipeline execution time from 14 minutes 20 seconds down to 3 minutes 45 seconds. Memory usage also dropped by 62%. jq is a flexible command-line JSON processor that extracts fields like image tags from manifest.json without the overhead of spinning up a full Python runtime inside the CI container.

Why does Kubeflow choke on a t3.xlarge GitLab Runner?

Running Kubeflow operators alongside Ansible roles and a heavy Docker container on a single t3.xlarge EC2 instance with GitLab Runner 16.8 exhausted available memory. The overhead of spinning up execution environments just to run validation scripts pushed memory usage to 98%, triggering the kernel OOM killer. Distributed machine learning platforms are overkill when the pipeline mostly needs to move files around.

What should I use instead of complex YAML pipeline templates?

Start with a bare shell and add tools only when it physically hurts not to have them. If a task can be done with awk, sed, jq, or a basic bash script, write that instead of pulling in heavy orchestration. Reserve platforms like Kubeflow for the infrastructure they were designed for, not the pipeline itself, and keep workspace paths short to avoid socket limits.

Linux Digest | Systems, DevOps & Open Source

The Hidden Cost of Heavy Tooling

Falling Back to CLI Primitives

The Ansible CI Socket Gotcha

What Actually Works

FAQ

Why does Ansible fail with SSH timeouts inside CI runners?

How much faster is jq than a Python container for parsing deployment manifests?

Why does Kubeflow choke on a t3.xlarge GitLab Runner?

What should I use instead of complex YAML pipeline templates?

Leave a Reply Cancel reply

Radhika Menon

The Hidden Cost of Heavy Tooling

Falling Back to CLI Primitives

The Ansible CI Socket Gotcha

What Actually Works

FAQ

Why does Ansible fail with SSH timeouts inside CI runners?

How much faster is jq than a Python container for parsing deployment manifests?

Why does Kubeflow choke on a t3.xlarge GitLab Runner?

What should I use instead of complex YAML pipeline templates?

Leave a Reply Cancel reply

Radhika Menon

Related Posts