---
title: "When GitHub Copilot Extensions Go Wrong — Part 1"
description: "One unclosed async handler took down all 43 of my Copilot agents. Here's what I discovered about extension failure modes and an idea I've been cooking up: the hollow extension pattern."
date: 2026-06-11
tags: ["GitHub Copilot", "Copilot CLI", "Agentic Development", "Software Architecture"]
canonical: https://htek.dev/articles/stop-building-fat-copilot-extensions
---
It took me 40 minutes to figure out why all 43 of my Copilot CLI agents were frozen. No errors. No crashes. Just silence — every agent, every cron job, every background task completely unresponsive. I had shipped a new Copilot CLI extension that afternoon. It had one unclosed `async` operation in a GitHub API polling loop, no timeout guard, no `catch` block. That was enough to stall the entire Node.js event loop in the extension host process. Every tool handler across every registered extension — dead.

I fixed the immediate issue in about 10 minutes once I found it. Then I spent the next three weeks trying to understand *why this happened at all*, and whether there was an architecture that could have prevented it.

This is Part 1 of what I learned.

## What Makes an Extension "Fat"

A fat Copilot CLI extension is one that bundles business logic directly inside its handler functions — inline HTTP calls, LLM chains, stateful caches, database writes, async operations with no timeout guards. The extension registers tools, hooks, and MCP connections, but then *also implements everything they do* in the same file, sometimes the same function.

Here's what that looks like in practice:

```js
// fat-extension.mjs — what NOT to do


// Fat pattern: business logic inlined directly inside handlers — no isolation
await joinSession({
  tools: [
    {
      name: "analyze_pr",
      description: "Analyze a GitHub pull request",
      parameters: {
        type: "object",
        properties: {
          repo: { type: "string", description: "owner/repo" },
          pr:   { type: "number", description: "PR number" },
        },
        required: ["repo", "pr"],
      },
      handler: async ({ repo, pr }) => {
        // Inline GitHub API call — no timeout guard
        const res = await fetch(`https://api.github.com/repos/${repo}/pulls/${pr}`);
        const data = await res.json();

        // Inline LLM call — can hang indefinitely
        const analysis = await openai.chat.completions.create({
          model: "gpt-4o",
          messages: [{ role: "user", content: `Analyze: ${JSON.stringify(data)}` }],
        });

        // Inline DB write — no error boundary
        await db.insert("pr_analysis", { pr, result: analysis.choices[0].message.content });
        return analysis.choices[0].message.content;
      },
    },
    {
      name: "run_ci_check",
      description: "Run CI check on a branch",
      parameters: {
        type: "object",
        properties: {
          branch: { type: "string", description: "Branch name" },
        },
        required: ["branch"],
      },
      handler: async ({ branch }) => {
        // 80 more lines of inline logic...
      },
    },
  ],
  hooks: {
    onPreToolUse: async (input) => {
      // 120 more lines of inline validation...
    },
  },
});
```

The problem isn't the code quality — it's the *architecture*. Every handler is an async operation running directly inside the extension host process. [GitHub Copilot CLI extensions](https://docs.github.com/en/copilot/building-copilot-extensions/about-building-copilot-extensions) share that process. If `analyze-pr` hangs on an API call that never times out, the entire event loop stalls. Tools from *other* extensions stop responding. Your agents sit there waiting for tools that will never answer.

I built this pattern three times before I understood why it kept breaking. The first iteration had no timeouts. The second had timeouts but inline state. The third had everything right *except* the unhandled rejection in the GitHub polling loop that eventually took down the fleet.

The real fix wasn't a better `try/catch`. It was a different architecture entirely.

![Side-by-side architecture comparison of fat extension anti-pattern vs hollow extension pattern. Left: fat extension with inline HTTP calls, LLM chains, and no timeout guards causing event loop stall. Right: hollow extension delegating all logic to an injectable Factory SDK.](/images/articles/stop-building-fat-copilot-extensions/fat-vs-hollow-diagram.webp)
*Fat Extension vs Hollow Extension — how embedding logic inside the extension host leads to fleet-wide failure, and how the hollow pattern prevents it*

## The Node.js Event Loop Is Not a Safety Net

The extension host runs your tool and hook handlers in series within each invocation context. An awaited operation that never resolves — a hung API call, a Promise that's never settled, an infinite polling loop — keeps the handler alive indefinitely. [Node.js fires an `unhandledRejection` event](https://nodejs.org/api/process.html#event-unhandledrejection) when a rejected Promise has no handler, but the more dangerous failure mode is a Promise that never rejects — it just hangs. Any subsequent call that needs a response from that handler waits forever.

In my experience running 40+ Copilot CLI agents against the same extension host, one stalled handler propagates outward fast. Tools from other extensions stop responding as the dispatch queue fills with unanswered requests. [Node.js event loop semantics](https://nodejs.org/en/learn/asynchronous-work/event-loop-timers-and-nexttick) mean a microtask queue backed up with unresolved Promises doesn't stop other I/O — but it does mean every caller waiting on those unresolved Promises will time out or freeze instead of getting a response.

## The Hollow Extension Pattern — An Idea in Progress

After the fleet went down, I started sketching. What if a Copilot CLI extension *never* contained any business logic at all? What if the entire extension was just a registration surface — calling methods on an injectable factory, wiring the results into the harness, and that was it?

The hollow extension pattern treats a Copilot CLI extension as a *registration surface only*. The extension's entire job is to wire an injectable factory into the harness — nothing more.

```js
// hollow-extension.mjs — the pattern that works



// Configure the factory — zero business logic in the extension itself
const factory = new PRAnalyzerFactory({
  timeoutMs: 8000,
  retries: 2,
  onError: (err, tool) => console.error(`[${tool}] failed:`, err.message),
});

// Extension is pure registration — no inline handlers
await joinSession({
  tools: factory.getTools(), // returns Tool[] array
  hooks: factory.getHooks(), // returns { onPreToolUse, onPostToolUse, onSessionStart }
});
```

That's the complete extension. Twenty-something lines. No inline business logic. No async footguns. No state.

I wasn't confident this would work. On paper it felt too simple, too thin to actually prevent a fleet-wide outage. But I tested it. The tools responded. The agents answered. The fleet came back online. I realized: sometimes you don't fix a reliability problem by adding controls. You fix it by removing surfaces where things can break.

The extension doesn't know what `factory.getTools()` returns internally. It doesn't know how the `analyze-pr` tool handles its GitHub API call, how it manages timeouts, or whether it batches requests. It just registers whatever the factory provides and starts the Copilot CLI extension host.

This is the [dependency injection principle](https://en.wikipedia.org/wiki/Dependency_injection) applied to extension architecture — and it's the same pattern I described in [the three architectural layers every AI agent is missing](/articles/three-layers-your-ai-agent-is-missing). The extension is the registration layer. The factory is the logic layer. They're separate, and the separation is the safety mechanism.

The pattern is also a direct application of the [factory method](https://refactoring.guru/design-patterns/factory-method) design pattern — a 30-year-old idea that turns out to be exactly what modern extension architectures need.

## Factory Implementer SDKs

Once the hollow extension pattern was clear — register the contract, implement nothing — one question followed immediately: *what fulfills the contract?* That’s the moment it clicked. *“Oh my God, I just thought of something — we can just CREATE what I just said.”* The extension is describing a factory interface. So build the factory. That’s the entire factory SDK idea in one sentence.

The factory SDK is where all the real work happens — but it happens in isolation, behind a well-defined interface.

```js
// factory.mjs — logic lives here, not in the extension
export class PRAnalyzerFactory {
  constructor(config) {
    this.config = config;
    // this.github, this.analyzer, this.ci, this.validator are injected deps
  }

  getTools() {
    // Returns the Tool[] array that joinSession expects
    return [
      {
        name: "analyze_pr",
        description: "Analyze a GitHub pull request",
        parameters: {
          type: "object",
          properties: {
            repo: { type: "string" },
            pr:   { type: "number" },
          },
          required: ["repo", "pr"],
        },
        handler: withTimeout(
          withRetry(async ({ repo, pr }) => {
            const data = await this.github.getPR(repo, pr);
            return await this.analyzer.analyze(data);
          }, this.config.retries),
          this.config.timeoutMs
        ),
      },
      {
        name: "run_ci_check",
        description: "Run a CI check on a branch",
        parameters: {
          type: "object",
          properties: {
            branch: { type: "string" },
          },
          required: ["branch"],
        },
        handler: withTimeout(
          async ({ branch }) => this.ci.check(branch),
          this.config.timeoutMs
        ),
      },
    ];
  }

  getHooks() {
    // Returns the hooks object that joinSession expects
    return {
      onSessionStart: async () => ({
        additionalContext: "[pr-analyzer] Factory extension active.",
      }),
      onPreToolUse: this.validator.preToolUseHook(),
    };
  }
}
```

![Factory SDK dependency injection flow diagram. Shows HarnessFactory implementing ToolProvider, HookProvider, and MCPProvider interfaces. Injected dependencies (github, analyzer, ci, validator) flow into the factory, which wraps every handler with withTimeout and withRetry guards before returning clean tool/hook/MCP contracts to the hollow extension.](/images/articles/stop-building-fat-copilot-extensions/factory-sdk-di-flow.webp)
*Factory SDK Dependency Injection Flow — injected deps in, guarded contracts out. All logic owned by the factory, all registration owned by the extension.*

Every tool is wrapped in `withTimeout` and optionally `withRetry`. The `this.github`, `this.analyzer`, `this.ci`, and `this.validator` dependencies are injected at factory construction — swappable, mockable, testable.

The factory approach also unlocks something I hadn't anticipated: I can now unit test all my tool logic *without a running Copilot CLI session*. I instantiate `HarnessFactory` with mock dependencies and test the handlers directly. The extension is just the deployment wrapper — the factory is the software.

This mirrors what I wrote about in [What Is Harness as Code](/articles/what-is-harness-as-code): declarative, injectable, reproducible. The fat extension anti-pattern is the same mistake as the [god prompt monolith](/articles/your-god-prompt-is-the-new-monolith) — everything bundled in one place because it was faster to write that way, slower to maintain.

## What This Unlocks for the Extension Ecosystem

The hollow extension pattern makes extensions into *interface specifications* rather than monolithic bundles. Teams can build multiple factory SDK implementations against the same extension interface — swapping auth strategies, retry policies, or MCP connections without touching the extension registration layer. This is the composability model that makes extension marketplaces viable.

Here's what got me excited beyond the immediate reliability win: this pattern is the right foundation for a Copilot extension marketplace.

Right now, if you want to adopt someone else's Copilot CLI extension, you're installing their full implementation — their API keys, their error handling assumptions, their retry logic, their specific GitHub API version. You're accepting the whole fat extension as-is. The [gh extension install](https://cli.github.com/manual/gh_extension_install) command is a blunt instrument for this reason: you get the whole package, hardcoded decisions and all.

With the hollow extension model, extensions become *interface specifications*, not implementations. The extension publishes what tools and hooks it registers, and what interfaces the factory implementer must satisfy. Teams can build their own factory SDKs against those interfaces — using their own auth patterns, their own retry strategies, their own MCP connections. The [TypeScript interface system](https://www.typescriptlang.org/docs/handbook/2/objects.html) is the natural contract layer here: publish the interface, version it separately from the implementation.

The Copilot extension platform already has the extensibility primitives to support this. Tools, hooks, and MCP connections are already first-class. The hollow extension + factory SDK separation is a pattern any extension builder can adopt today — no platform changes required.

I've written about the [agentic development maturity curve](/articles/agentic-development-maturity-curve) before: at expert level, complexity collapses back to simple, explicit primitives. Fat extensions are the middle of that curve — impressive-looking, fragile. Hollow extensions are what you build when you've learned what actually goes wrong at 3 AM.

## What Comes Next

The hollow extension pattern solved the fleet stability crisis. But it raised a new question: if the extension is just a registration surface, what about the factory SDK itself? How do you scale that? How do you compose multiple factory implementations? What happens when you have *too many* factories, too many injectable dependencies, too many layers?

I've been experimenting with an answer — a framework I've been calling "Harness as Code." It's the next iteration of the hollow pattern idea, and it changes how you think about building modular Copilot ecosystems.

That's Part 2.

## The Pattern in Three Sentences

Register thin. Inject logic. Guard every async.

The line that crystallized it: *"Not the files, the factory. Not the context, the mechanism."* Every time I was chasing an extension bug, I was looking in the wrong layer. The extension is a file — inert, structural, just registration. The factory is the mechanism — where reliability lives, where tests run, where logic can be replaced without touching the extension surface. Fix the mechanism. Don't touch the file.

An extension's job is to tell the Copilot CLI harness what's available — not to *be* what's available. The business logic belongs in a factory SDK that owns its own timeout boundaries, error surfaces, and dependency graph. One bad extension shouldn't be able to take down your fleet. With the hollow pattern, it can't.

If you're building for the [GitHub Copilot CLI](https://docs.github.com/en/copilot/github-copilot-in-the-cli/using-github-copilot-in-the-cli) ecosystem, this is the pattern I've landed on. Whether it stays this way, or whether Harness as Code evolves it further, I'm still learning. But the principle holds: don't embed logic in extensions. Separate registration from implementation. Guard every async boundary. That's the foundation.

---

*Related: [I Taught My AI Agent to Restart Itself](/articles/copilot-cli-self-restart-extension) — another extension architecture lesson learned the hard way.*
