AI Dev Essentials #38: GPT 5.5

John
Instructor

John

Hey Everyone πŸ‘‹,

John Lindquist here with the 38th issue of AI Dev Essentials!

Long time, no see 🫣. In the months since our last issue, the AI dev tools landscape has gone through a series of significant changes. It's now universally accepted that Agentic development is the new normal and everyone is scrambling to figure out exactly what that means.

There is also a small but important change on my side: AI Dev Essentials is moving under the dev.build brand. All of my future materials, workshops, and cohorts will be hosted under the dev.build brand and, soon, future issues of "AI Dev Essentials" will arrive from john@dev.build.

GPT 5.5 convinced me to switch over to Codex as my primary Agentic dev tool, so expect to see most all of the dev.build materials focused on "Codex Power Users" for the near future. That shift is exactly what the new workshop is built around: turning Codex from "another CLI" into a real, repeatable development system you can trust on real projects.

I've spent the last few years helping hundreds of developers level up in Cursor and Claude Code through dozens of sold-out workshops. The first Codex Power User Workshop brings that same battle-tested approach to Codex: context packaging, terminal workflows for parallel specialized agents, tool profiles, reusable skills, hooks, durable memory, and the Codex SDK. You'll leave with an operating system you can keep running after the session, not a notes doc full of tips.

More on the full curriculum below. First, let's catch up on what changed while issue 38 was sitting in drafts.

⚑ Codex Power User Workshop β€” πŸ“… Fri May 29, 9am PST Reserve your spot β†’


πŸ“š dev.build Lessons

Optimize Codex Configuration for Blazing Fast CLI Automation

Tune your Codex CLI for non-interactive, automation-friendly runs. This lesson walks through the config knobs that strip the latency and friction out of headless Codex sessions so you can wire it into scripts, hooks, and pipelines without fighting the defaults.

(dev.build)

Use Codex to Build a Fully Agentic Workspace

Turn Codex into the coordinator of a real agentic workspace: parallel terminal sessions, focused agents for research, planning, building, and validation, and a workflow you can actually run on a real project instead of a demo repo.

(dev.build)

How to Block npm in Codex with Shims and Exec Policies

A practical guardrail lesson: lock npm out of Codex sessions using shims and exec policies so agents can't reach for it when you've decided to standardize on something else. Useful any time you want hard boundaries around what an agent can run.

(dev.build)


πŸš€ Major Announcements Since Issue 37

Model Update OpenAI Releases GPT-5.5 and GPT-5.5 Pro

OpenAI released GPT-5.5 on April 23, 2026, rolling out to ChatGPT Plus, Pro, Business, and Enterprise, with GPT-5.5 Pro rolling to Pro, Business, and Enterprise.

Codex gets GPT-5.5 on Plus, Pro, Business, Enterprise, Edu, and Go plans with a 400K context window.

Coding Benchmarks:

  • Terminal-Bench 2.0: 82.7%
  • SWE-Bench Pro: 58.6%
  • Expert-SWE (internal): 73.1%

API:

  • gpt-5.5: $5 per 1M input tokens, $30 per 1M output, 1M context window
  • gpt-5.5-pro: $30 per 1M input, $180 per 1M output

(OpenAI β€” Introducing GPT-5.5)

πŸ’¬ GPT 5.5 is widely accepted as the best coding model available today (this is my own personal opinion as well). If you haven't tried it yet, I strongly recommended giving it a few days through Codex or any of the other AI coding tools available.


Postmortem Anthropic Postmortem Details Claude Code Quality Regression

Anthropic published a postmortem on April 23, 2026, tracing recent Claude Code quality complaints to three separate changes. The API and inference layer were not impacted.

Root Causes:

  • Default reasoning effort changed from high to medium
  • Stale session thinking was cleared repeatedly after idle resumes
  • Verbosity system prompt reduced coding quality. A single prompt ablation showed a 3% drop for both Opus 4.6 and Opus 4.7

Resolution:

  • All three issues were fixed by April 20 in Claude Code v2.1.116
  • Usage limits were reset for all subscribers on April 23
  • Future prompt changes will get broader per-model evals, soak periods, and gradual rollouts

(Anthropic Engineering Postmortem)

πŸ’¬ This was one of the largest controversies of the year so far. Developers had been complaining for months that quality was degrading, but Anthropic always pushed back. Finally, Anthropic admitted to their mistakes and took responsibility for the issue, but this was the start of many developers losing trust in Anthropic. Anthropic is still worth a Trillion dollars, so I don't think they're too sad about it, but the dev community likes to feel important 😬


Model Update Anthropic Releases Claude Opus 4.7

Anthropic released Claude Opus 4.7 on April 16, 2026, positioned as an upgrade for advanced software engineering, long-horizon tasks, vision, and professional work.

What's New:

  • /ultrareview command in Claude Code
  • xhigh effort level added between high and max
  • Task budgets in public beta on the Claude Platform
  • High-resolution image support up to 2,576 pixels on the long edge (~3.75 megapixels)

Migration Notes:

  • Tokenizer changes mean the same input can map to roughly 1.0 to 1.35 times more tokens
  • Higher effort settings can produce more output tokens, so budget accordingly

(Anthropic β€” Opus 4.7 Announcement, Claude Code Docs β€” ultrareview)

πŸ’¬ Most people I know think 4.7 has been a regression from 4.6. People are losing trust in benchmarks and are suspicious of model updates becoming cost-saving measures rather than actual improvements. I personally think we all built up a lot of patterns and practices for 4.6 that didn't transfer well to 4.7 and nobody can be bothered to re-learn a new model. It's honestly one of the reasons why GPT 5.5 is shining so much because no one feels like they have to "learn" it. Here are 6 Tips for Getting the Most Out of 4.7 if you want to see how far you can push it.


Model Update DeepSeek V4 Preview Brings 1M Context to Open Weights

DeepSeek released V4 Preview on April 24, 2026 with two open weight MoE models: V4 Pro at 1.6T total parameters with 49B active, and V4 Flash at 284B total parameters with 13B active. Both support a 1M token context window, with API access already live.

DeepSeek positions V4 Pro as competitive with top closed-source models on reasoning and coding, while V4 Flash targets fast, efficient agent workloads at significantly lower cost per token.

(DeepSeek β€” V4 Preview Announcement)

πŸ’¬ DeepSeek is staying extremely close to the state-of-the-art (us nerds call it "SOTA") and the price point is incredibly competitive compared to OpenAI and Anthropic. Unfortunately, it's just not quite good enough to run as the primary model for production use or "claws" out there unless you put in serious effort to fine-tune it for your specific use case.


Model Update Moonshot Open-Sources Kimi K2.6 for Long-Horizon Coding

Moonshot released Kimi K2.6 on April 20, 2026, focused on coding, long-horizon execution, and agent swarm capabilities.

Specs:

  • Context: 262,144 tokens
  • Terminal-Bench 2.0: 66.7%
  • SWE-Bench Pro: 58.6%
  • SWE-Bench Verified: 80.2%

Long-Run Examples (reported by Moonshot):

  • 4,000+ tool calls over 12 hours optimizing Zig inference
  • 1,000+ tool calls over 13 hours modifying 4,000+ lines in exchange-core

Available through the Vercel AI Gateway as moonshotai/kimi-k2.6.

(Kimi Technical Blog, Vercel AI Gateway β€” Kimi K2.6)

πŸ’¬ I've heard more people having luck with Kimi than with DeepSeek. I've you're looking for the cheapest model for your use case, definitely give Kimi a try.


Ecosystem Google Cloud Next '26 Centers on Gemini Enterprise Agent Platform

Google announced the Gemini Enterprise Agent Platform on April 22, 2026, positioning it as the evolution of Vertex AI for building, scaling, governing, and optimizing agents.

Google says nearly 75% of Google Cloud customers use its AI products, and 330 customers processed more than 1 trillion tokens each in the past 12 months.

Platform Components:

  • Model access via Model Garden, including third-party Claude models
  • Agent Studio and upgraded ADK
  • Agent Runtime, Identity, Registry, Gateway
  • Agent Simulation, Evaluation, Observability

Also announced: 8th generation TPUs, Agentic Data Cloud, Workspace Intelligence, and the Virgo Network.

(Google Cloud Next Announcements, Gemini Enterprise Agent Platform)

πŸ’¬ Google I/O is in 4 days. Google builds their own TPUs. I'm really interested in what they'll do with their glasses. Don't sleep on Google, they have the foundations that the "smaller" labs are lacking.


Security OpenAI Launches Daybreak for Cyber Defense

OpenAI launched Daybreak in May 2026, a cyber defense initiative covering secure code review, threat modeling, patch validation, dependency risk analysis, detection, and remediation guidance. A related GPT-5.5 Cyber limited preview is aimed at verified defenders working on critical infrastructure and high-stakes systems.

(OpenAI β€” Daybreak, OpenAI β€” Trusted Access for Cyber)

πŸ’¬ Security incidents are almost a daily occurrence now. Daybreak and Mythos from Anthropic (unreleased, but lots of chatter) can hopefully help us lock things down. But 2026 is feeling like the year where AI+security will unfortunately be the main storyline.


πŸ› οΈ Developer Tooling Updates

Tool Codex and Claude Code Add /goal for Completion-Driven Runs

Claude Code 2.1.139 added /goal, which lets you set a completion condition and have Claude keep working across turns until that condition is met. It works in interactive mode, -p, and Remote Control, with live elapsed time, turn count, and token tracking.

(Claude Code β€” Changelog)

πŸ’¬ /goal is my new favorite feature. I use it constantly and something that I'll focus on in my workshops and future videos.


Safety OpenAI Publishes Auto-Review for Codex Sandbox Approvals

OpenAI published details on Auto-Review, a Codex safety mode where a separate reviewer agent evaluates sandbox boundary requests instead of routing every approval to a human. OpenAI reports internal Codex sessions stop for human approval roughly 200 times less often with Auto-Review than with manual approval, while keeping high recall on prompt injection and other risky requests in their evals.

(OpenAI Alignment β€” Auto-Review, Codex Docs β€” Auto-Review Sandboxing)

πŸ’¬ Sooooo many review tools. It's like the low-hanging fruit of trying to cash in on AI dev money. They are all getting much, much better, but I'm so overwhelmend by options that I have no idea what to recommend. So just go with whatever you're already paying for.


Tool Codex Comes to the ChatGPT Mobile App

OpenAI is rolling out Codex inside the ChatGPT mobile app on iOS and Android, so you can monitor, steer, and approve coding tasks from your phone.

(OpenAI β€” Work with Codex from Anywhere)

πŸ’¬ This requires the Codex app, not the terminal. But definitely going to figure out how to make it work from the terminal.


SDK SDKs Everywhere!

Cursor introduced the Cursor SDK on April 29, 2026, giving developers access to the same runtime, harness, and models that power Cursor's own agents. The SDK can run agents locally or in Cursor's cloud, which makes it usable for CI workflows, internal automations, and embedding Cursor-style agents directly inside other products.

OpenAI shipped a substantial Agents SDK update around April 15, 2026, adding sandboxed agents, computer use, skills, memory, compaction, and a more open harness so teams can scale Codex-style agents in production without rebuilding the surrounding infrastructure.

(Cursor β€” TypeScript SDK, OpenAI β€” The Next Evolution of the Agents SDK, OpenAI API Changelog)

πŸ’¬ The interesting pattern is that everyone is converging on the same pieces: harness, sandbox, memory, tools, review, and observability. The SDK is where those pieces stop being a demo and start becoming an app architecture.


Plugin Codex Adds an OpenAI Developers Plugin

OpenAI released an OpenAI Developers plugin for Codex that connects Codex to OpenAI platform context, guides setup, helps debug API usage, and streamlines project API key workflows.

(OpenAI Developers β€” Codex Plugin, OpenAI API Changelog)

πŸ’¬ If Codex is going to build AI apps, it should understand the platform it is building against. The best agent tools remove the annoying setup and lookup work without pretending the hard engineering decisions went away.


Tool Vercel AI Gateway Adds Opus 4.7, Kimi K2.6, and GPT Image 2

Vercel added three major models to AI Gateway across the past week.

  • Claude Opus 4.7 (April 16)
  • Kimi K2.6 as moonshotai/kimi-k2.6 (April 20)
  • GPT Image 2 as openai/gpt-image-2 with support for up to 2K resolution and dense text rendering (April 21)

AI Gateway provides a unified API, usage and cost tracking, retries, failover, BYOK, observability, and provider routing.

(AI Gateway β€” Opus 4.7, AI Gateway β€” Kimi K2.6, AI Gateway β€” GPT Image 2)


Tool Cursor Ships Agent Canvases and CLI Debug Mode

Cursor rolled out a tight sequence of updates across April 13–15, 2026, starting with Cursor 3.1 and layering Agent Canvases and CLI improvements on top.

Agent Canvases (April 15):

  • Dashboards, custom interfaces, tables, diagrams, charts, diffs, and to-do lists
  • Persist as durable artifacts in the Agents Window side panel, alongside terminal, browser, and source control

CLI Updates (April 14):

  • /debug, /btw, /config, /statusline commands
  • Model picker improvements
  • Clipboard image paste
  • Better auto-run behavior

Cursor 3.1 (April 13) also shipped a new tiled layout and upgraded voice input.

(Cursor β€” Agent Canvases, Cursor β€” CLI Debug Mode, Cursor 3.1)

πŸ’¬ Canvases are a smart direction. Agent output as inspectable, persistent artifacts rather than a stream of text in a chat window matches how I actually work. Still mostly in Claude Code myself, but this is the first Cursor update in a while that tempted me back.


Open Source Warp Goes Open Source

Warp open-sourced its client on April 28, 2026, giving developers a way to inspect and contribute to the terminal that has been steadily moving toward agent-first workflows. Warp says the project will use an agent-powered open source process.

(Warp β€” Warp is Now Open Source)

πŸ’¬ I am always happy to see more developer tooling open up, especially terminal tooling. The test is whether outside contributors can actually shape the project, not just read the code. The governance and license details are where this story stops being a press release and starts being meaningful.


New IDE Zed 1.0 Ships After More Than a Thousand Releases

Zed reached 1.0 on April 29, 2026, after more than a thousand pre-1.0 releases, continuing its push around a fast Rust-based, GPU-rendered editor with multi-agent collaboration baked in.

(Zed β€” 1.0 Announcement)

πŸ’¬ Yay Zed! It's totally my preferred editor.


API Gemini API Adds Event-Driven Webhooks

Google added webhooks to the Gemini API for long-running operations, including Batch API jobs and other async workflows. Instead of polling until a job finishes, developers can receive push notifications when work completes.

(Google β€” Event-Driven Webhooks, Gemini API Changelog)

πŸ’¬ Polling is fine until every agent, batch job, and media task is doing it. Webhooks are boring in exactly the way production agent systems need.


CLI Notion Launches Developer Platform and ntn CLI

Notion launched a new developer platform on May 13, 2026, including an External Agent API, Notion Workers, and ntn, a CLI for working with the Notion API from the terminal. The CLI handles auth, API requests, worker deployment, and developer workflows that work for both humans and coding agents.

(Notion β€” Introducing the Developer Platform, ntn CLI Docs)

πŸ’¬ This is really interesting from a "Let my Agent manage a knowledge base" perspective...


CLI GitHub Copilot CLI Gets Auto Model Selection

GitHub made auto model selection generally available in Copilot CLI on April 17, 2026, across all Copilot plans.

How It Works:

  • Routes between GPT-5.4, GPT-5.3-Codex, Sonnet 4.6, and Haiku 4.5 based on plan and admin policies
  • CLI shows which model was used per turn; users can switch to a specific model
  • Premium request usage is limited to models with 0x to 1x multipliers
  • Paid subscribers get a 10% discount on the selected model multiplier when using auto

(GitHub Changelog β€” Copilot CLI Auto)


Tool VS Code Copilot Adds Bring Your Own Model Key

GitHub added BYOK support to VS Code Copilot for Business and Enterprise users on April 22, 2026.

Supported Sources:

  • Anthropic, Gemini, OpenAI, OpenRouter, Azure, and local models via Ollama and Foundry Local
  • Works anywhere in VS Code Chat, including the built-in plan agent and custom agents
  • Does not apply to code completions
  • Usage billed directly by the chosen provider; does not count against Copilot request quotas

(GitHub Changelog β€” VS Code BYOK)

πŸ’¬ If you're on Copilot Business or Enterprise and you've been watching your quota sweat, this is the release you've been waiting for. Keep the UX, move the spend and privacy decisions to your own provider.


πŸ€– AI Ecosystem Updates

Ecosystem GitHub Tightens Copilot Plan Access

Across April 20 to 22, 2026, GitHub paused new signups for several Copilot plans and changed model access.

Changes:

  • New signups paused for Copilot Student, Pro, and Pro+ individual plans (Copilot Free remains open)
  • Self-serve Copilot Business signups paused for organizations on GitHub Free and Team plans (existing Business customers unaffected)
  • Opus models removed from Copilot Pro; Opus 4.7 remains on Pro+
  • Copilot Pro+ now has more than 5x the limits of Pro

(GitHub Changelog β€” Copilot Plan Changes)

Share with a coworker