AI Dev Essentials #33: Gemini 3 Pro, Antigravity IDE, & GPT-5.1 Codex Max

John Lindquist
Instructor

John Lindquist

Hey Everyone đź‘‹,

John Lindquist here with the 33rd issue of AI Dev Essentials!

Google launched Gemini 3 Pro alongside their new Antigravity IDE, both representing significant advances in multimodal understanding and agentic development. OpenAI released GPT-5.1-Codex-Max with 24-hour autonomous task capabilities and ChatGPT for Teachers, while making group chats available in pilot regions. Claude models became available in Microsoft Foundry with structured outputs now in public beta. Cloudflare acquired Replicate to strengthen their AI infrastructure.

I've been spending the week diving deep into Anthropic's engineering guide on Code Execution with MCP and exploring their frontend design plugin to prepare for the upcoming workshop. I've also been putting Gemini 3 through its paces to determine if it earns a spot in my daily workflow.

🚨 Happening TOMORROW! 🚨 Last chance to sign up for the Claude Code Power User Workshop - European Timezone Editon 🌍: https://egghead.io/workshop/claude-code


🚀 Major Announcements

Google Launches Gemini 3 Pro with Record Benchmark Scores [Breaking]

Google DeepMind announced Gemini 3 Pro on November 18, 2025, achieving the highest benchmark scores for a publicly available model with 1501 on LMArena and introducing new capabilities in multimodal understanding and generative UI.

Model capabilities:

  • Record Performance: First model to exceed 1500 on LMArena benchmark
  • PhD-Level Reasoning: 37.5% on Humanity's Last Exam (without tools), 91.9% on GPQA Diamond
  • Best Multimodal Understanding: Leading performance across text, images, audio, and video
  • Generative UI: LLMs can generate complete user experiences including web pages, games, tools, and applications
  • Availability: Gemini app on desktop, mobile app, and mobile web with "Thinking" mode selection
  • Developer Access: Available in AI Studio and via Gemini API

Gemini 3 Deep Think variant:

  • Enhanced Reasoning: Outperforms Gemini 3 Pro on major benchmarks
  • Humanity's Last Exam: 41.0% (without tools)
  • GPQA Diamond: 93.8%
  • ARC-AGI: 45.1% with code execution
  • Availability: Coming to Google AI Ultra subscribers in the coming weeks

(Google Official Blog, Google Developers Blog, TechCrunch, CNBC)

This is really big. Gemini 3 has been absolutely crushing it designs and one-shotting incredible demos. I've had it build a 3d playable piano, recreate Wordle and other games, and create almost perfect designs of Slack and other UI-heavy apps. I've had much worse luck in longer programming tasks and I've heard evidence that it requires a lot of "hand holding" and specific instuctions, but maybe that's because it's still considered a preview? Either way, it's definitely a tool I will be using a lot for prototyping designs. Google is quickly integrating Gemini 3 into every product, even the main Google "AI Mode" which I highly recommend checking out. Things are really heating up.

Google Unveils Antigravity: Agent-First IDE Powered by Gemini 3 [New IDE]

Google launched Antigravity on November 18, 2025, a new agentic development platform featuring agent-first architecture with direct access to editor, terminal, and browser for autonomous execution.

Platform features:

  • Multi-Model Support: Gemini 3, Anthropic Claude Sonnet 4.5, and OpenAI GPT OSS models
  • Two Main Modes: Editor view for hands-on coding with agent sidebar; Manager view for mission control and async orchestration
  • Direct Tool Access: Agents can autonomously use editor, terminal, and browser without approval friction
  • Free Public Preview: Available for Mac, Windows, and Linux with generous Gemini 3 Pro rate limits
  • Download: antigravity.google

Strategic positioning:

  • Competes directly with Cursor, Windsurf, Claude Code, and other agentic IDEs
  • Built on VS Code fork similar to competitors but with Google's AI infrastructure
  • Focus on async agent orchestration for background task management

(TechCrunch, VentureBeat, The New Stack)

Apparently they forked Windsurf (they forgot to remove some references to "Cascade") and are attempting a Cursor clone of sorts. I've tried it a handful of times, but keep running into model limits and errors, so I don't really have a read on it. I've had much better luck using Gemini 3 in the Gemini CLI and Cursor. I'll check back in on AntiGravity once they sort out the launch hiccups.

OpenAI Releases GPT-5.1-Codex-Max with 24-Hour Task Capabilities [Model Update]

OpenAI announced GPT-5.1-Codex-Max on November 19, 2025, introducing the first model natively trained to operate across multiple context windows using compaction technology for autonomous work spanning 24+ hours.

Technical capabilities:

  • Compaction Technology: Natively trained to operate coherently over millions of tokens in a single task
  • Multi-Day Tasks: Internally tested completing tasks lasting 24+ hours including multi-step refactors, test-driven iteration, and autonomous debugging
  • Performance: 77.9% on SWE-Bench Verified at extra-high reasoning effort
  • Token Efficiency: Uses 30% fewer thinking tokens than GPT-5.1-Codex at medium reasoning effort
  • Windows Support: First OpenAI model trained to operate in Windows environments
  • Availability: Available in Codex (CLI, IDE extension, cloud, code review); API access coming soon

(OpenAI Blog, VentureBeat, The Decoder)

This has been out less than 24 hours, but it's apparently the best coding model available. I haven't had enough time to properly evaluate it, but I've heard nothing but good things.

GPT-5.1 Pro Spotted in the Wild [Model Update]

Reports are flooding in that OpenAI has begun rolling out GPT-5.1 Pro to Pro and Team users, with early access users calling it "easily the most capable and impressive model" to date.

Rollout details:

  • Availability: Spotted appearing for Pro and Team users (Nov 19)
  • Performance: Early reviews describe it as a "monster" with significant capability jumps
  • Capabilities: Matt Shumer's early review highlights it as the "most capable" model he's used
  • Status: Silent rollout, appearing in model selectors without a dedicated blog post yet

(TestingCatalog News, Matt Shumer Review)

GPT-5.1 Pro showed up in my account a few minutes ago. I packed up a bunch of files from my Script Kit project and dumped them into the model and came away pretty impressed. I imagine I'll be using this a lot for planning. I'm curious if/when it will hit APIs so we could script our own workflows.

Claude Models Available in Microsoft Foundry with $30B Azure Commitment [Model Update]

Anthropic announced on November 18, 2025, that Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 are now available in public preview in Microsoft Foundry, alongside a $30 billion Azure cloud capacity commitment and $15 billion combined investment from NVIDIA and Microsoft.

Partnership details:

  • Models Available: Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 in public preview
  • Azure Commitment: Anthropic agreed to spend $30 billion on Microsoft Azure cloud capacity
  • Strategic Investment: NVIDIA ($10 billion) and Microsoft ($5 billion) investing $15 billion combined in Anthropic
  • Full Capabilities: All models support code execution, web search/fetch, citations, vision, tool use, and prompt caching
  • MACC Eligible: Models eligible for Microsoft Azure Consumption Commitment agreements
  • Integration: Claude models can be used with Claude Code in Foundry environment

(Anthropic Newsroom, Microsoft Azure Blog, NVIDIA Blog)

It's good to see Anthropic making moves. Claude Code and Sonnet 4.5 are still my go-to models for writing code and just kicking off random tasks without having to give it a ton of direction. I'm really hoping to see an Opus 4.5 release soon with Google and OpenAI making big moves.

OpenAI Launches ChatGPT for Teachers Free Through June 2027 [Education]

OpenAI announced ChatGPT for Teachers on November 19, 2025, providing free access to GPT-5.1 Auto with unlimited messages for verified U.S. K-12 educators through June 2027.

Program details:

  • Free Access: Unlimited messages with GPT-5.1 Auto, search, file uploads, connectors, and image generation
  • Education Compliance: FERPA-compliant with education-grade privacy and security
  • Student Data Protection: Student data will not be used to train models
  • Immediate Availability: Available to all educators immediately upon verification
  • District Partnerships: Active with 150,000+ teachers and staff across Fairfax County Public Schools, Houston ISD, Fulton County Schools, Prince William County Public Schools
  • Purpose: Give teachers hands-on experience with AI tools to establish best practices

(OpenAI Blog, CNBC, Axios, Engadget)

Smart play by OpenAI. Education is such an excellent use case for AI, especially when it comes to customizing lessons for specific classrooms and students. I just hope the teachers realize they still need to review the output for accuracy (especially anything Math related). I'm curious if they'll have some sort of teacher onboarding experience to help them get started.


🛠️ Developer Tooling Updates

Cloudflare Acquires Replicate to Strengthen AI Infrastructure [Acquisition]

Cloudflare announced on November 17, 2025, an agreement to acquire Replicate, an AI model deployment platform hosting 50,000+ production-ready models, with completion expected in two months.

Acquisition details:

  • Model Catalog: Replicate hosts 50,000+ production-ready AI models
  • Timeline: Acquisition expected to complete in two months
  • Integration: Replicate's models will integrate with Cloudflare Workers AI
  • Brand: Replicate will continue as distinct brand with better speed and resources
  • Developer Impact: Promises more seamless AI cloud experience for developers

(Cloudflare Press Release, Business Wire, Replicate Blog)

Seems like a nice pairing. Although I think Cloudflare lost a lot of the good will from this acquisition thanks to the major outage this week...

Gemini CLI 0.16.0 Ships Major UI Overhaul [CLI]

Google released Gemini CLI v0.16.0 on November 15-18, 2025, featuring a complete UI rendering overhaul eliminating visual noise and adding mouse support.

UI improvements:

  • Eliminated Flickering: Visual noise and bouncing screens completely removed
  • Mouse Support: Click and navigate within input prompt
  • Fixed Cursor Position: No longer lost in long output streams
  • Scrollbar Improvements: Click-and-drag functionality added
  • Gemini 3 Pro Access: Support for paid Gemini API keys enabling Gemini 3 Pro in terminal
  • Continued Updates: v0.17.0-preview.0 released November 18 with additional polish

(GitHub Releases, Google Developers Blog, Google Developers Blog: Gemini 3 in CLI)

I've heard rumors that the Gemini CLI is making big moves. I'm really excited to see what they're up to. I just feel like all the other CLI tools are so far behind Claude Code when it comes to customization and scriptability.

Claude Structured Outputs Launch in Public Beta [Beta]

Anthropic released Structured Outputs in public beta on November 14, 2025, for Claude Sonnet 4.5 and Opus 4.1, guaranteeing JSON schema compliance and eliminating parsing errors.

Technical features:

  • Schema Compliance: Guarantees JSON schema compliance in API responses
  • Zero Parsing Errors: Eliminates parsing errors requiring retry logic and validation workarounds
  • Python & TypeScript: Supports both Pydantic and Zod schema definitions
  • Beta Header: Requires structured-outputs-2025-11-13 header
  • Model Support: Sonnet 4.5 and Opus 4.1 (Haiku 4.5 support coming soon)
  • Production Focus: Addresses production LLM reliability issues

(Claude Developer Platform Blog, Anthropic Release Notes)

We've all had a lot of hacks and workarounds in place do to the lack of structured outputs that we're finally going to be able to retire. Woo!

NotebookLM Adds Image Support for Research Sources

Google announced on November 13, 2025, that NotebookLM now supports images as source material, including photos, screenshots, infographics, handwritten notes, and whiteboards.

New capabilities:

  • Image Sources: Photos, screenshots, infographics, handwritten notes, whiteboards
  • Rollout: Available over subsequent weeks after announcement
  • iOS Support: v1.17.0 (released November 18) added image file support
  • Additional Updates: Part of larger expansion including Deep Research feature and Google Sheets/Word doc support

(Google Blog, TechCrunch, Digital Trends)

My biggest use case is screenshots from my phone. Such an awesome experinece to be able to capture anything and drop it into a long term memory bank.


🤖 AI Ecosystem Updates

OpenAI Pilots Group Chats in ChatGPT Across Four Regions

OpenAI launched a pilot for group chats in ChatGPT on November 13, 2025, enabling up to 20 participants to collaborate with ChatGPT in shared conversations.

Pilot details:

  • Pilot Regions: Japan, New Zealand, South Korea, and Taiwan
  • Participant Limit: Up to 20 participants per chat
  • Platform: Mobile and web
  • Availability: Free, Plus, Go, and Pro users
  • Model: Powered by GPT-5.1 Auto (routing system selecting best model automatically)
  • Features: Search, image generation, file uploads, dictation, emoji reactions, personalized images
  • Privacy: Group chats separate from private conversations; personal ChatGPT memory never shared

(OpenAI Blog, TechCrunch, Engadget, Axios)

Anthropic Partners with Rwanda and ALX for Chidi AI Learning Companion

Anthropic announced on November 18, 2025, a partnership with the Government of Rwanda and ALX Africa to deploy Chidi, a Socratic learning companion built on Claude, reaching 200,000+ students across Africa.

Partnership details:

  • Launch Date: November 4, 2025 (rollout start)
  • Reach: 200,000+ students through ALX (African tech training provider)
  • Government Partners: Rwanda's ICT & Innovation and Education ministries
  • Early Results: 1,100+ conversations, nearly 4,000 learning sessions; 9 out of 10 users reporting positive experiences
  • Phase 1: Teacher training for 2,000 teachers and civil servants in Rwanda
  • Cost Structure: Anthropic covers LLM/API costs; ALX provides training/implementation; Rwanda provides policy support and school access

(Anthropic Newsroom, APO Group, EdTech Innovation Hub)

Google Releases WeatherNext 2 with 8x Faster Forecasting

Google DeepMind introduced WeatherNext 2 on November 17, 2025, as the most advanced and efficient weather forecasting model using a new Functional Generative Network architecture.

Model capabilities:

  • 8x Faster: 1-hour resolution forecasting
  • New Architecture: Functional Generative Network (FGN) approach
  • Superior Performance: Surpasses previous model on 99.9% of variables and lead times (0-15 days)
  • Scenario Generation: Hundreds of forecast scenarios in under 1 minute using single TPU
  • Integration: Now in Google Search, Gemini, Pixel Weather, Google Maps Platform

(Google Blog, 9to5Google, Android Authority, TechRadar)


Sponsored by Membrane

Why coding agents struggle with integrations — and how Membrane fixes it

AI coding agents are getting frighteningly good—but there’s still one place they consistently fall down: building real product integrations. Not toy examples. Not one-off scripts. Actual customer-facing integrations that need correct OAuth flows, rate limits, pagination, webhooks, retries, schema drift handling, tests, observability—the whole messy universe of APIs across thousands of SaaS apps.

That’s where Membrane comes in. It’s the only AI built specifically for product integrations, with a purpose-made engine and a deep knowledge system about how APIs behave in the real world. Describe the integration you need, and Membrane generates a production-ready package—complete with correct API calls, field mappings, data structures, tests, and a runtime built for reliability. Teams that used to spend 5–6 weeks building an integration now ship in 5 minutes.

The best part: Membrane connects directly into your AI coding workflow. Whether you’re using Claude Code, Cursor, or any MCP-compatible agent, Membrane gives them the missing integration-infrastructure superpowers. Your agent can generate, adjust, test, and deploy integrations without leaving your IDE. You stay in your repo, your CI, your environment—just with far more leverage.

If you’ve ever asked a coding agent to “build an integration with X,” and watched it hallucinate endpoints or ignore pagination rules, this is the fix. Membrane gives agents the context they’ve always lacked: how APIs actually work, how integrations should run, and how to keep them healthy over time.

In short: coding agents write code. Membrane ships integrations.

Check it out here: getmembrane.com/agent/claude-code


⚡ Quick Updates

GPT-5.1 Models Available in Cursor

  • Three Models: GPT-5.1 for everyday tasks, GPT-5.1 Codex for ambitious coding, GPT-5.1 Codex Mini for cost-efficient changes
  • Availability: All models accessible in Cursor as of November 13
  • Integration: Full Cursor feature support including Composer and Agent

(Cursor)

Claude Code Frontend Design Plugin Released

  • Release Date: November 12, 2025
  • Purpose: Addresses "distributional convergence" (LLM tendency toward generic designs)
  • Four-Dimension Framework: Typography, Color & Theme, Motion, Backgrounds
  • Availability: Official Claude Code GitHub repository

(Claude Blog, GitHub)

Google Releases 54-Page AI Agent Production Guide

  • Release: November 2025
  • Authors: Alan Blount, Antonio Gulli, Shubham Saboo, Michael Zimmermann, Vladimir Vuskovic
  • Content: Five-level agent taxonomy, operational cycle framework, production standards
  • Access: Free download from Google Cloud

(Google Cloud)


✨ Workshop Spotlight

Claude Code Power User Workshop

European Timezone Editon 🌍 (🚨 TOMORROW! 🚨)

Date: Tomorrow, Friday November 21, 2025 Time: 2:00 PM - 7:00 PM (CET) Platform: Zoom

Pricing:

What You'll Learn:

Learn the essential skills to ship reliable AI-generated code with confidence. This hands-on workshop covers everything from foundational prompting to advanced automation using the Claude Code SDK and custom integrations.

Core Skills:

  • Context Engineering: Control what context Claude sees for reliable, consistent results
  • TypeScript SDK: Script Claude programmatically to build custom workflows
  • Custom Hooks: Automate repetitive tasks with Claude Code hooks
  • Model Context Protocol: Integrate APIs securely to extend Claude's capabilities
  • Claude Code Skills: Build custom skills for Claude Code
  • Live Q&A: Get your specific questions answered by John Lindquist

👉 Register: https://egghead.io/workshop/claude-code


Read this far? Share "AI Dev Essentials" with a friend! - https://egghead.io/newsletters/ai-dev-essentials

  • John Lindquist

https://egghead.io

Share with a coworker