AI Dev Essentials #37: Gemini 3 Flash & GPT-5.2-Codex Arrive

Hey Everyone 👋,

John Lindquist here with the 37th issue of AI Dev Essentials!

Google released Gemini 3 Flash globally, delivering Pro-level performance at Flash speed. OpenAI launched GPT-5.2-Codex, their most advanced agentic coding model with a 400K context window. Anthropic made Agent Skills an open standard with Microsoft, Cursor, and others already adopting it, plus launched a Skills Directory with partners like Vercel, Figma, Notion, and Stripe.

On the tooling front, NVIDIA dropped Nemotron 3 Nano with a 1M context window, Cline moved to Vercel AI Gateway cutting errors by 24%, and OpenCode launched their Desktop beta. Google also introduced CC, an experimental AI productivity agent in Gmail, and integrated Opal (their vibe-coding tool) directly into Gemini.

On a personal note, I spoke at Microsoft AI Dev Days last week and shipped a nice little agent scripting tool called mdflow. Let's start this off with a sweet little deal...

🎁 egghead.io Lifetime Sale 👀

Since I'm not teaching a workshop this month, I wanted to do something a little different and open up a sweet little bundle:

egghead Lifetime. Pay once, keep it for life.

Plus, during this sale you get two full workshop recordings:

Cursor Power User workshop recording ($375 value)
Claude Code Power User workshop recording ($375 value)

That's $750 in bonus training along with everything else on egghead. If you want to end the year strong and start 2026 with momentum, this is your next step.

👉 Get egghead Lifetime here

🚀 mdflow Launch

I launched mdflow this week. It's a scripting language for AI agents built on executable markdown. You write instructions and code blocks in a .md file, and mdflow runs them in sequence, passing context between steps. Think of it as literate programming meets agent orchestration. It's perfect for creating repeatable workflows that AI agents can follow, or for making your documentation actually runnable. The syntax is just markdown, so there's no new language to learn.

🚀 Major Announcements

`Breaking` Google Releases Gemini 3 Flash with Pro-Level Performance

Google released Gemini 3 Flash on December 17, 2025, making it the default model in the Gemini app globally. The model delivers "frontier intelligence built for speed" with Pro-level performance at Flash-level latency.

Benchmark Performance:

GPQA Diamond: 90.4% (PhD-level reasoning)
Humanity's Last Exam: 33.7% without tools (3x the previous Flash model's 11%)
MMMU-Pro: 81.2% (state-of-the-art, comparable to Gemini 3 Pro)
Video-MMMU: 87.6% (multimodal video understanding)
SWE-bench Verified: 78% for agentic coding (outperforms Gemini 3 Pro's 72.8%)

Technical Specs:

Context Window: 1 million tokens input, 64K output
Speed: 3x faster than Gemini 2.5 Pro, 218 output tokens/second
Token Efficiency: Uses 30% fewer tokens than 2.5 Pro on average
Thinking Levels: Four modes (minimal, low, medium, high) for controlling reasoning depth

Pricing:

Input: $0.50 per 1M tokens
Output: $3.00 per 1M tokens
Context Caching: 90% cost reduction for repeated tokens
Batch API: 50% cost savings for async processing

Availability: Gemini app (default), Google AI Studio, Vertex AI, Gemini API, Gemini CLI, Google Antigravity, and AI Mode in Google Search.

(Google Blog, Google Developers Blog, TechCrunch, SiliconANGLE)

💬 Flash 3 is my normal life "daily driver". It's the default model on Android for all of my dictated actions. It's my default model on gemini for replacing "searching". I'm still deeply in love with Opus 4.5 for development, but I'm experimenting with Opus 4.5 being a "manager" and multiple Flash 3s being the "workers". Nothing to share yet, but I'll keep you posted on how it turns out.

OpenAI Releases GPT-5.2-Codex for Agentic Coding

OpenAI launched GPT-5.2-Codex on December 18, 2025, describing it as "the most advanced agentic coding model yet for complex, real-world software engineering."

Technical Capabilities:

Context Window: 400,000 tokens with 128,000 max output tokens
Native Compaction: Optimized for long-horizon work through context compaction
Windows Support: First OpenAI model trained to operate in Windows environments
Enhanced Security: Strongest cybersecurity capabilities of any OpenAI model

Benchmark Performance:

SWE-Bench Pro: 56.4% accuracy (highest of all coding models)
Terminal-Bench 2.0: 64% accuracy

Real-World Validation: Security engineer Andrew MacPherson used GPT-5.1-Codex-Max to discover three React Server Components vulnerabilities (CVE-2025-55183, CVE-2025-55184, CVE-2025-67779).

Availability: All Codex surfaces for paid ChatGPT users. API access coming in the following weeks.

(OpenAI Blog, OpenAI System Card, SiliconANGLE)

💬 The Opus 4.5 challenger arrives! I've heard great things from friends about it, but since it just dropped yesterday and I'm deep in Claude Code work, I haven't had a chance to spin up Codex, etc and give it a fair shake. Initial anectodal reports that "it's really smart in different ways", so I think deciding when to use which model might have to be learned trait based on your specific requirements. I'll have to post more about mdflow and how it can bring together multiple models/agents into a single prompt...

`Breaking` Anthropic Makes Agent Skills an Open Standard

Anthropic announced on December 18, 2025 that Agent Skills is now an open standard, with a new Skills Directory launching at claude.com/connectors.

Open Standard Adoption:

Microsoft: Adopted within VS Code and GitHub
Coding Agents: Cursor, Goose, Amp, and OpenCode already supporting skills
Directory Partners: Atlassian (Jira, Confluence), Canva, Figma, Notion, Cloudflare, Zapier, Stripe, Vercel

Skills Directory Features:

Browse and enable skills at claude.com/connectors
One-click installation for verified partner skills
Skills work across Claude.ai, Claude Desktop, and Claude Code

(Anthropic - Skills Open Standard, Claude Blog - Skills Directory, Claude Blog - Introducing Skills)

💬 We all win here! I've grown to love skills over the past couple of months. It's a great way of lazy-loading a ton of relevant context for specific tasks and they're completely decoupled from the tools that need them. So it's awesome to see the AI ecosystem adopt them and I'm sure we'll see an explosions of shared skills in 2026.

`Tool` Claude Code Chrome Extension Enables Browser Testing

Anthropic released Claude Code integration with Chrome on December 18, 2025, enabling a complete build-test-verify workflow directly from the terminal.

Capabilities:

Test code directly in the browser to validate work
Read client-side errors via console logs
Access DOM state, network requests, and page content
Interact with authenticated web apps

Workflow: Build with Claude Code in your terminal, test in the browser with the Chrome extension, debug using console logs. Launch with /chrome command.

(Claude Code Docs - Chrome, Claude Blog - Chrome Pilot)

💬 What distinguishes this from other solutions is how you get to use "Real Chrome" and not an isolated playwright browser. I've had a lot of luck using the Claude Chrome extension for interacting with real web apps. I can't wait to hook it up to Claude Code for dev projects.

🛠️ Developer Tooling Updates

`Tool` xAI Launches Grok Voice Agent APIs

xAI released the Grok Voice Agent API on December 17, 2025, enabling developers to build voice agents with sub-second response times.

Technical Specs:

Latency: Average time-to-first-audio under 1 second (5x faster than closest competitor)
Benchmark: #1 on Big Bench Audio
Languages: 100+ with automatic detection
Pricing: $0.05 per minute flat rate

Available Voices: Sal, Rex, Eve, Leo, Ara, Mika, Valentin. Developers can prompt auditory cues like [whisper], [sigh], and [laugh].

(xAI Documentation, LiveKit Partnership)

`Tool` NVIDIA Releases Nemotron 3 Nano with 1M Context

NVIDIA released Nemotron 3 Nano on December 15, 2025, a hybrid reasoning model with 1M token context window.

Architecture:

Parameters: 3.2B active (31.6B total) with hybrid Mamba-Transformer MoE
Context: 1M tokens (68.2% accuracy at full context on RULER)
Throughput: Up to 3.3x higher than comparable open-source models

Open Source: Weights, datasets, training recipes, 3T pre-training tokens. Runs on 24GB RAM via GGUF.

(NVIDIA Research, Hugging Face Blog)

`Tool` Cline Moves to Vercel AI Gateway

Cline announced their migration to Vercel AI Gateway on December 16, 2025, reporting significant improvements.

Performance: Error rates down 24.3%, P99 streaming latency improved 10-14%, automatic failover during provider outages.

Scale: 1M+ developers and 4M+ installations.

(Vercel Blog, Cline Blog)

`Tool` OpenCode Desktop Launches in Beta

OpenCode released their Desktop application in beta on December 15, 2025.

Features: Native cross-platform app (Tauri + SolidJS), multi-project workspace, persistent sessions, embedded terminal, diff viewers. Free models included or connect any provider.

Install: brew install --cask opencode-desktop on macOS.

(OpenCode Official, GitHub)

`Tool` Claude Code Ships Terminal Rendering Overhaul

Anthropic announced on December 17, 2025 that Claude Code's terminal rendering was rewritten, reducing flickering by ~85%.

Improvements: Fixed diff view on resize, 3x better memory usage for large conversations, fixed rendering during token streaming.

December 16 Updates: Syntax highlighting for diffs, prompt suggestions, first-party plugins marketplace, shareable guest passes.

(ClaudeLog Changelog)

💬 We'll see if this sticks. I just read that they're rolling it back temporarily due to some edge cases in some environments.

🤖 AI Ecosystem Updates

`Tool` Google Labs Launches CC AI Productivity Agent

Google Labs released CC on December 16, 2025, an experimental AI agent delivering "Your Day Ahead" briefings via Gmail, Calendar, and Drive.

Capabilities: Morning briefings, email drafting, calendar link creation, web search context, cross-document connections.

Limitations: Cannot send emails or reschedule meetings autonomously.

Availability: Early access for Google AI Ultra and paid Gemini subscribers (US/Canada).

(Google Blog, 9to5Google)

💬 This is actually pretty awesome. I definitely recommend giving it a shot to see a clean summary of the emails of your day that you might not otherwise read.

`Tool` Gemini Deep Research Adds Visual Reports

Google enhanced Deep Research on December 16, 2025 with AI-generated images, charts, and interactive simulations.

Capabilities: Custom visuals from research, interactive variable adjustment, reports in Canvas. Exclusive to AI Ultra ($250/month).

(Google Blog, 9to5Google)

💬 Some of the graphs and charts they produce are super clean. I hope they expose ways to have full control over the visualizations and allow users to customize them further.

`Tool` NotebookLM Integration Rolling Out in Gemini

Google began rolling out NotebookLM integration in Gemini on December 13-17, 2025.

How It Works: Attach notebooks as context via the plus menu. Gemini responds based on curated sources rather than web search.

(9to5Google, Android Central)

💬 This is the integration I've been waiting for! Being able to use NotebookLM as context in Gemini means I can leverage curated research directly.

`Ecosystem` OpenAI Launches ChatGPT App Store

OpenAI launched the ChatGPT App Store on December 18, 2025, opening to 800+ million users.

Partners: Booking.com, Spotify, Dropbox, DoorDash, Zillow. Built on MCP. Submit apps through OpenAI Developer Platform.

(TechCrunch, Engadget)

💬 This is a big opportunity for devs who want to jump in and try to claim one of those "first to the app store" spots.

`Research` Apple Releases SHARP for Instant 3D Photo Conversion

Apple published SHARP on December 11, 2025, converting single 2D photos to 3D in under one second.

Technical: Single feedforward pass, 3D Gaussian at 100+ fps, 25-34% LPIPS improvement, three orders of magnitude faster than previous methods. Open source under Apache 2.0.

(arXiv, Apple ML Research, GitHub)

⚡ Quick Updates

Gemini 2.5 Flash Native Audio Upgrade

Release: December 12, 2025
Function Calling: ComplexFuncBench Audio score of 71.5%
Instruction Following: 90% adherence rate (up from 84%)
Live Translation: Streaming speech-to-speech in 70+ languages

(Google Blog)

Standard JSON Schema Specification

Created by: Zod, Valibot, and ArkType creators
Purpose: Unified JSON Schema with preserved type inference
AI SDK 6 Beta: Native jsonSchema helper function

(Standard JSON Schema)

Google Agent Development Kit for TypeScript

Release: December 17, 2025
Version: ADK TypeScript v0.2.0
Features: Code-first multi-agent systems, type-safe development

(Google Developers Blog)

React Grab Agent Integration

Feature: Click any React element, copy exact source location
Agents: Works with Claude Code, Cursor, OpenCode, Codex, Gemini
Install: npx @react-grab/cli@latest

(GitHub)

📢 Workshop Update

Due to the raging success of this past year's AI Workshops, I'm moving them to a monthly cadence under dev.build to help with bookkeeping. There are no workshops left this year, but we're planning 1 per month next year with the first tentatively planned for Jan 30. We'll announce more soon.

Happy Holidays 🎅!

This is the last newsletter of the year (I really hope no one is releases anything interesting next week...). I'll be on vacation with my family and loved ones next week and I sincerely hope you and yours have a wonderful holiday season! See you in 2026!

Read this far? Share "AI Dev Essentials" with a friend! — egghead.io/newsletters/ai-dev-essentials

— John Lindquist

egghead.io

Did you enjoy this issue? Share it with a friend.

🎁 egghead.io Lifetime Sale 👀

🚀 mdflow Launch

🚀 Major Announcements

Breaking Google Releases Gemini 3 Flash with Pro-Level Performance

OpenAI Releases GPT-5.2-Codex for Agentic Coding

Breaking Anthropic Makes Agent Skills an Open Standard

Tool Claude Code Chrome Extension Enables Browser Testing

🛠️ Developer Tooling Updates

Tool xAI Launches Grok Voice Agent APIs

Tool NVIDIA Releases Nemotron 3 Nano with 1M Context

Tool Cline Moves to Vercel AI Gateway

Tool OpenCode Desktop Launches in Beta

Tool Claude Code Ships Terminal Rendering Overhaul