A Survey of AI Coding Abilities

Adapted from a Yale SOM Faculty Seminar on November 19, 2025 by Kyle Jensen. Kyle is a magician and I deserve almost no credit for this post.

Kyle wants you to know that much of the material here was based on a post here by HumanLayer.

Description

The Six‑Fold Path to AI Coding “Enlightenment”

Think of AI coding as a set of levels. You don’t have to reach Level 5 to be productive, but it helps to know what’s possible.

Description
The sixfold path to enlightenment
  • Level 0 – You live in the browser, copy code from ChatGPT / Claude and paste into your editor.
  • Level 1 – Your IDE (VS Code, Cursor, Zed, etc.) offers inline completions and inline chat
  • Level 2 – You enable "agent mode" inside the IDE: the model can read files, run tests, refactor, etc.
  • Level 3 – You use dedicated coding agents (Claude Code, OpenAI Codex CLI, Gemini CLI, Copilot CLI, …).
  • Level 4 – You attach MCP tools (Model Context Protocol) so agents can search the web, scrape, browse, orchestrate.
  • Level 5 – You orchestrate teams of agents, often in containers, with CI-like pipelines and automated workflows.

One Rule to Rule Them All

A rough mental model for AI coding performance:

\[\text{Performance} \approx \frac{(\text{correctness}^2 \times \text{completeness})}{\text{size}}\]

As context size grows (more files, longer prompts, giant logs), performance tends to worsen. The job is to structure work so the model can stay correct and complete without being drowned in text.


Level 0: Top Tips for Copy/Paste Coding

If you’re coding from the web UI (ChatGPT, Claude, etc.), a few small tricks go a long way.

Anatomy of a Modern LLM Chat

Under the hood, a “chat” is a structured pipeline:

Description
The anatomy of a modern LLM chat

You usually see only:

  1. Your prompt
  2. The LLM’s response

…but in between, it may:

Good AI coding tools expose more of this flow and let you customize it.

Why You Don’t Want to Write Serious Code in the Web Interface

The web UI is great for experiments and small snippets, but it’s a bad home for a real project:

For serious work, move into an AI-enabled IDE or terminal tool.


Level 1: Integrated Development Environments (IDEs)

A quick comparison of the current ecosystem:

Cursor

“The most AI‑forward IDE.”

VS Code + Copilot

Zed

Example: VS Code + Copilot

VS Code and Copilot is a great way to start, since as a student or educator you can get a subscription for free. See Paul’s post here for further discussion.

  1. Code completion – Predicts the next few lines as you type.
  2. Inline AI chat – Ask questions right in the file:

    • “Explain this function.”
    • “Add a docstring and type hints.”
  3. Sidebar chat / plans – Have a longer conversation about a task and then apply edits.
  4. Agent mode – The model can:

    • Open files
    • Run tests
    • Apply multi‑file refactors

Which raises the question: what is an agent?


Level 2: Agent-based programming

An agent is an LLM running in a loop with tools.

Description
An agent is an LLM running in a loop with tools.

The agent repeatedly:

  1. Reads your instructions and its current context.
  2. Decides which tool(s) to use.
  3. Calls tools, processes results.
  4. Replies or loops back for more tool use.

Cursor’s Agent‑Forward Approach

Description
Cursor is a very agent-friendly IDE.

Cursor is built around this agent loop.

  1. Agent‑forward workflow

    • The agent is the primary interface: you ask it to:

      • Understand unfamiliar code
      • Implement features
      • Fix bugs
      • Keep a long‑running plan in mind
  2. Automatic context indexing

    • Cursor continuously indexes your repo so the agent can:

      • Find relevant files
      • Trace call graphs
      • Surface tests, configs, docs
  3. MCP support (Model Context Protocol)

    • Lets you plug in external tools:

      • Custom APIs
      • Document stores
      • Browsers, scrapers, etc.
    • The agent gains new abilities without retraining.


Level 3: Terminal‑Based AI Tools

You don’t have to live in a GUI; the command line has great options:

Claude Code

OpenAI Codex CLI

Gemini CLI

Copilot CLI

Claude’s Agents & Skills

Claude’s skills system is an example of hierarchical agents.

Description
Claude's skills system enables hierarchical agents.

Sweet features:


Level 4: Extending Agents with MCP Tools

Using the Model Context Protocol (MCP), you can bolt new “superpowers” onto agents like Claude (and others).

Some favorites:

Context Engineering Is the New Prompt Engineering

Prompt engineering used to be mainly about phrasing (“act as an expert…”). Now it’s about managing the entire context window.

Think of a coding session as a stack of context:

Description

As you get closer to the context window limit, the model will:

Remember our rule:

Performance ≈ (correctness² × completeness) ÷ size

Managing size is now a core skill.

Auto‑Compaction: What Happens at the Limit

Description

If you just keep chatting in a long session, eventually the LLM will auto‑compact:

This is rarely ideal for serious coding, where exact requirements and edge cases matter.

Intentional Compaction: A Better Pattern

Instead of letting the model compact for you, you can compact intentionally.

Description

Pattern:

  1. Do exploratory research

    • “Summarize library options for X.”
    • “Audit these modules and note tech debt.”
  2. Write artifacts to disk

    • research.txt, plan.md, design.md, etc.
    • Ask the model explicitly:

      • “Write a concise research summary to research.txt.”
      • “Write an implementation plan to plan.md.”
  3. Restart with a smaller, cleaner context

    • New session:

      • “Read research.txt and plan.md. Then implement step 1.”

You’re now working with short, precise docs instead of a giant, messy chat log.

Delegation to Sub‑Agents: Best of All

Combine intentional compaction with sub‑agents:

Description

This is where things start to feel like real engineering management:

The New Role of Humans in Coding

So… what are humans for now? mental alignment.

Description
What is the role of humans in programming now?

Imagine a triangle:

The best marginal use of your time is:
  • Creating mental alignment with the machine
  • Creating the conditions for it to excel:
    • Right tools
    • Good guardrails
    • Clear guidance, specs, and tests
You're less of a typist and more of a systems designer, reviewer, and product thinker.

Level 5: YOLO Mode and Safety

“YOLO mode” (let the agent do everything) is fun—and dangerous.

Key risk factors:

Description
The lethal trifecta of secrets, access, and ability

The Economist memorably called these the “lethal trifecta” of AI risk for coders: tools that can read secrets, talk to the world, and execute code.

Treat powerful agents like bridge engineers treat load‑bearing structures:

If you’re in an institutional environment, pair advanced AI setups with a safety review.