Skip to content
← Back to Articles

You're Not Doing GitOps (You're Doing CI/CD With Extra Steps)

· 5 min read
DevOps GitOps GitHub Actions Platform Engineering AI

The Uncomfortable Truth

Here’s a test: when your deployment fails in production, what happens to your main branch?

If the answer is “the broken code is already merged” — congratulations, you’re doing CI/CD with a Git trigger. That’s not GitOps. It’s a pipeline that happens to watch a branch.

I’ve spent years building platform engineering systems at enterprise scale — identity management frameworks, infrastructure-as-code pipelines, AI agent platforms that manage operational code. And I keep seeing the same mistake: teams adopt “GitOps” by adding a deployment step after merge, then wonder why they get drift.

True GitOps has one non-negotiable rule: main always equals production. If a deployment fails, main doesn’t change. Period. This isn’t just my opinion — it’s the logical extension of OpenGitOps principles: declarative desired state, versioned in Git, automatically reconciled. The enforcement mechanism I’m describing is how you make those principles real rather than aspirational.

The Anti-Pattern Everyone Runs

The most common “GitOps” setup I see in enterprise teams looks like this:

  1. Developer opens PR
  2. CI runs tests
  3. Reviewer approves
  4. PR merges to main
  5. Deployment triggers from main
  6. ❌ Deployment fails
  7. main now contains code that isn’t in production

This is merge-then-deploy. It’s standard CI/CD with extra steps. The moment you merge before confirming a successful deployment, you’ve broken the core GitOps contract: Git as the single source of truth for what’s actually running.

The result? Drift. Stale state in main. A branch that lies about what’s deployed. Every subsequent PR is now based on a broken foundation.

The Enforcement Pattern: Deploy Before Merge

The fix isn’t philosophical — it’s mechanical. GitHub’s Merge Queue gives you exactly the right primitive:

  1. Developer opens PR
  2. CI runs tests (standard checks)
  3. Reviewer approves → PR enters the merge queue
  4. Merge queue trigger runs a dry-run deployment against the target environment
  5. If dry-run passes → queue trigger runs the live deployment
  6. If live deployment succeeds → PR merges to main
  7. If deployment fails → PR is rejected. main stays clean.

This is the critical difference. The merge is the receipt, not the trigger. By the time code lands in main, it’s already proven it can deploy successfully. main never lies.

GitHub ships hundreds of changes per day using exactly this pattern — batch PRs into merge groups, test and deploy the group, merge only on success.

Environment Parity: The Force Multiplier

The MergeQueue pattern only works if you’ve solved the second GitOps requirement: environment parity.

Every environment — dev, staging, production — should deploy using the exact same scripts. The only difference is configuration parameters. If your prod deployment uses a different process than dev, you’ve introduced a variable that the merge queue can’t validate.

Here’s the mental model: environments aren’t stages in a pipeline. They’re instances of the same declaration with different inputs. Your Terraform modules, your Helm charts, your infrastructure definitions — same code, different .tfvars or values.yaml.

This is where I see the most breakage. Teams invest in merge queues but maintain hand-rolled production deployment scripts that diverge from their staging process. In my experience, the #1 thing that breaks production is environmental differences — not bad code, not missing tests, but a deployment process that works differently in prod than it did in staging. HashiCorp’s Well-Architected Framework emphasizes this same principle: operational artifacts in Git should be the single declaration that drives all environments.

Where to Start: The High-Stakes Workflow

If you’re onboarding a platform engineer into a GitOps-first team, don’t start with app deployments. Start with networking-as-code or firewall-as-code — systems where a failed deployment can be company-destroying.

Why? Because it forces the right engineering instincts:

These aren’t theoretical — they’re survival questions when you’re managing production firewalls through code. The rigor you develop there carries into every other GitOps workflow.

Infrastructure-as-code for identity management is another excellent starting point. I’ve built systems where Entra ID applications with RBAC definitions are entirely managed through code — every role assignment, every app registration, every permission scope. The MergeQueue pattern here means a misconfigured role never reaches production without a successful dry-run proving it resolves correctly.

AI Agents Make GitOps More Critical, Not Less

Here’s where the conversation gets forward-looking. AI agents — GitHub Copilot coding agent, autonomous infrastructure bots, custom platform agents — are increasingly the primary authors of operational code. The traditional distinction between GitOps and CI/CD matters more than ever when machines are the ones making commits.

This doesn’t make GitOps obsolete. It makes it non-negotiable. I’ve written about why governed agent systems need exactly this kind of enforcement — and the GitOps substrate is how you get there.

Consider: if an AI agent can codify a process — user onboarding, access provisioning, network configuration — and you have a deterministic sync process validating that code, you can safely let agents manage entire operational domains. The GitOps pattern becomes the guardrail that makes autonomous agents viable.

I run 50+ AI agents managing operational code daily. They don’t hit APIs directly — they modify code, which flows through the same MergeQueue validation as human-authored changes. Policy violations surface as deployment failures. The agent’s code either passes or it doesn’t. No special paths, no elevated privileges, no drift.

The enforcement pattern:

This is where the industry is heading. Harness calls it “agentic AI in DevOps” — autonomous agents that observe, reason, and act on infrastructure. I’ve explored this convergence in agent-proof architecture for agentic DevOps. But without GitOps as the substrate, autonomous agents become autonomous drift generators.

The Litmus Test

Before you call your workflow “GitOps,” answer these three questions:

  1. If a deployment fails, does main still change? If yes — that’s CI/CD.
  2. Can you reconstruct every environment from Git alone? If no — you have drift.
  3. Are agents and humans subject to the same merge rules? If no — you have a governance gap.

If all three pass, you’re doing GitOps. If not, you’re doing CI/CD with a Git trigger — and that’s fine, but call it what it is.

The Bottom Line

GitOps isn’t a tooling choice — it’s an enforcement philosophy. The core contract is brutally simple: main equals production, always. The MergeQueue pattern is how you mechanically enforce that contract. Environment parity is how you make it trustworthy. And as AI agents become your primary infrastructure operators, that enforcement isn’t just nice-to-have — it’s the only thing standing between autonomous agents and uncontrolled drift.

Stop deploying after merge. Start merging after deployment. That’s GitOps.

Resources


← All Articles