AI Dev Essentials #11: Gemini 2.5 Stable, MCP goes mainstream

Hey Everyone 👋,

John Lindquist here with the eleventh issue of AI Dev Essentials! I've been digging more and more into using Claude Code to spin up projects. Here's a couple of fun ideas I've been putting together using only Claude Code:

dotagent (https://www.npmjs.com/package/dotagent) - A tool for syncing rules/instructions across Cursor/Claude/Windsurf/etc.
chromancer (https://www.npmjs.com/package/chromancer) - A CLI for interacting with Chrome. It wraps around Claude Code to generate and save automated browser workflows.
Script Kit (https://www.scriptkit.com/) - I added MCP support so Cursor/etc can invoke your scripts.

There's obviously no way I could've done this much work in the past week without using AI agents (mostly Claude Code).

I'm still deterimining guidelines for Cursor vs Claude Code. Cursor recently introduced a new Ultra plan that's $200/month and has 20x more usage than Pro which should allow you to run o3 (an awesome model) full time. My quick, too-early take:

Cursor gives you more control and customizations. It's perfect for "focused" work where you want to be an active participant in the process.
Claude Code is great for "automated" work where you just want to leave the desk and come back to to review a feature.

As always, the model used will always be more important than the tool. So since Cursor can also use Claude Opus/Sonnet and has backgound agents + bugbot, it's just a more fully-featured offering which can only continue to improve. I love CLIs (I'm mean I really, really love CLIS), but there always comes a point where you need to open a GUI. Anyway, I'll be continuing to use both for now.

I was also just accepted into the OpenAI Open Source fund which granted me $5000 in credits to use the Codex CLI (OpenAI's version of Claude Code), so I should have strong opinions on that very soon. I love OpenAI's Codex through the web interface where it can spin up your project and submit PRs, so I have high hopes for the CLI.

I've honestly had more fun programming over the past couple of months than I've had during my entire 20+ year career!

🎓 New egghead.io Lessons This Week

The Building Local AI Agents with Ollama and the AI SDK course is now live! I've released the first 5 lessons covering:

Setting up Ollama for local AI development
Building your first agent with the AI SDK
Implementing tool calling and function execution
Managing agent state and conversation flow
Deploying local agents with persistent memory

Check out the course at https://egghead.io/courses/scripting-local-language-models-with-ollama-and-the-vercel-ai-sdk~gmi9k

🤖 Model & Platform Updates

Gemini 2.5 Family Goes Stable

Google has restructured their Gemini 2.5 lineup with significant improvements:

Gemini 2.5 Pro (Stable, no changes from 06-05)

Now the stable long-term support model
Maintains existing pricing structure
"gemini-2.5-pro" model name for production use

Gemini 2.5 Flash (Stable, updated pricing)

Simplified pricing based on developer feedback
Performance improvements from 05-20 variant
Now the go-to model for cost-conscious applications

NEW: Gemini 2.5 Flash-Lite (Preview)

Small reasoning model at just $0.10/1M input, $0.40/1M output
Pushes the cost/intelligence frontier
Ideal for high-volume applications requiring reasoning capabilities

(Google Blog, Google Cloud Blog)

The introduction of Flash-Lite is particularly exciting. We're seeing Google compete aggressively on the cost front while maintaining reasoning capabilities. This could be a game-changer for applications that need to process massive amounts of data with some level of intelligence. Honestly, I'm kinda glad they've shipped 2.5 so they can focus on whatever 3.0 is going to be 😅

OpenAI's Open Source Model Progress Update

At Y Combinator's AI startup school, Sam Altman provided an update on OpenAI's open-source model initiative, first announced in March 2025. The model, which will be OpenAI's first open-weight release since GPT-2 in 2019, is being developed with reasoning capabilities similar to o3-mini and is aimed for release in early summer.

Key details:

Text-in, text-out model designed for high-end consumer hardware
Highly permissive license with few usage restrictions
Toggle-able reasoning capabilities
Led by VP of Research Aidan Clark

(TechCrunch, Reuters, VentureBeat)

OpenAI has been gathering community feedback since March, and Altman himself admitted they've been "on the wrong side of history" regarding open source. This shift is clearly influenced by the success of Meta's Llama models (over 1 billion downloads) and pressure from competitors like DeepSeek. I'm looking forward to seeing how this balances OpenAI's commercial interests with community needs.

Revolutionary AI Performance in Medical Research

A combination of GPT-4.1, o3-mini-high, and Gemini 2.0 Flash achieved superhuman performance in systematic medical reviews (medRxiv):

Completed 12 work-years of Cochrane reviews in just 2 days
96.7% sensitivity vs 81.7% for human reviewers
Found 54 additional eligible studies missed by original authors
Generated new statistically significant findings in 2 reviews

The system correctly identified all originally included studies while discovering critical findings humans missed, like preoperative immune-enhancing supplementation reducing hospital stays by one day.

This is the kind of AI application that genuinely excites me. It's not replacing researchers but augmenting them to find insights that would otherwise be missed. The fact that blinded reviewers sided with the AI 69.3% of the time when there were disagreements speaks volumes.

🔧 Developer Tooling & MCP Updates

MCP Goes Mainstream

The Model Context Protocol is having a moment:

Cloudflare Open Sources use-mcp

React library connecting to any MCP server in 3 lines of code
Supports SSE and Streamable HTTPS remote servers
Handles OAuth flow automatically
Part of the official MCP organization for reference implementation

(Cloudflare Blog, GitHub)

OpenAI Adds MCP Support to ChatGPT

Limited initially to search and fetch tools for deep research
Remote MCP with OAuth support
Documentation available at platform.openai.com/docs/mcp

Google Cloud Run MCP Tutorial

Deploy remote MCP servers in under 10 minutes
Comprehensive guide for production deployments

(Google Cloud Blog)

Block's MCP Best Practices

Lessons from building 60+ MCP servers
Design patterns and architecture recommendations

(Block Engineering Blog)

MCP is becoming the standard for AI tool integration. The fact that major players are all adopting it suggests we're moving toward a more interoperable AI ecosystem. I'll definietly have more educational materials around MCPs soon!

OpenAI's Prompt Management Revolution

OpenAI introduced Prompts as an API primitive:

Centrally manage, version, and optimize prompts
Use across Playground, API, Evals, and Stored Completions
Preconfigured with tools, models, and messages
New "Optimize" button in Playground for API optimization
Replaces the old presets system

(OpenAI Community, Documentation)

Finally! Prompt management has been a pain point for production AI applications. This brings prompts to the same level as other API resources, making it easier to maintain consistency across teams and applications.

Developer Resources

models.dev - Open Source AI Model Database

Comprehensive pricing and limits information
Powers opencode under the hood
Community-contributed with API access
Star and contribute at models.dev

(Official Site, GitHub)

Container-use Now Works with OpenAI Codex CLI

Solomon Hykes confirmed integration
Enables containerized development workflows

(GitHub)

Giving AI Agents enough control to rm -rf is why the community is so bullish on containers for everything!

💻 Cursor Corner

Cursor Ultra Plan Launches at $200/month

Cursor introduced a new Ultra tier with 20x more usage than Pro:

Addresses heavy user demands for unlimited-style usage
Positioned between Pro and enterprise offerings
Reflects the growing appetite for AI-assisted development at scale

(Cursor Blog, TechCrunch)

This is exactly what I've been hoping for! The mental overhead of tracking usage on Pro was real. While $200/month isn't cheap, for professional developers who live in Cursor, it's a no-brainer. I'm curious to see if this pressures other tools to offer similar "all you can eat" plans. There's a bit of buzz around how ambigous the messaging/pricing and it's obviously frustrating for anyone trying to budget out AI costs.

🚀 Quick Updates

OpenAI Codex Best-of-N: Generate multiple solutions simultaneously, rolling out to Pro, Enterprise, Team, Edu, and Plus users (Changelog)
Claude Code "ultrathink" Trigger: Use thinking triggers for better results, with "ultrathink" being the most powerful (Anthropic Docs)
Google Labs Flow Updates: Complete tutorial thread on ingredients, frames, camera controls, and scene builder for Veo 3 (Google Blog, FAQ)
ElevenLabs v3: Revolutionary emotion control with audio tags like [excited], [whispers], [sighs], [laughs] (ElevenLabs v3, Blog)

🔒 Security & Best Practices

The Lethal Trifecta Warning

Simon Willison echoed many of my thoughts around MCP+Security. Definitely worth a read: (Blog Post):

The Lethal Trifecta = Private Data + Untrusted Content + External Communication

Any AI system with all three components is vulnerable to data exfiltration attacks. This is especially critical for MCP users who might inadvertently combine:

File system access (private data)
Web browsing (untrusted content)
API calls or messaging (external communication)

Mitigation Strategies:

Never combine all three capabilities in one agent
Implement strict sandboxing between components
Use human-in-the-loop verification for sensitive operations
Audit your MCP tool combinations carefully

This is the kind of security thinking we need more of. As we give AI agents more capabilities, we need to think adversarially about how they could be exploited. The MCP ecosystem makes it trivially easy to create vulnerable combinations.

⚡ AI Ecosystem Observations

Google Cloud IAM Outage Cascade

A major lesson in infrastructure dependencies:

Google Cloud IAM failure cascaded to Cloudflare, Anthropic, Spotify, Discord, and Replit
Highlighted single points of failure in the AI infrastructure stack
Key takeaway: Implement graceful fallbacks and avoid single-provider dependencies

(Full Analysis)

Did you feel this? I sure did. That was a strange couple of hours where I felt lost without my AI agents... I'm just glad it was a simple failure and not a more serious issue.

AI Agents Playing Diplomacy

Fascinating behavioral patterns emerged:

Claude: Couldn't lie, was exploited by other players
Gemini 2.5 Pro: Nearly conquered Europe with tactical brilliance
o3: Orchestrated secret coalitions, backstabbed allies, won

These experiments reveal fundamental differences in how models approach strategy and deception. When choosing models for your applications, consider not just capabilities but behavioral tendencies. I'm sure we'll see more about this in the future.

AGI as Product Experience

Logan Kilpatrick's perspective on AGI resonated strongly this week:

AGI won't be a single model breakthrough
Product experience integrating memory, reasoning, and context matters more
Small model improvements + great design = "AGI moment" for users
The narrative is shifting from model-centric to product-centric

(Source)

🚨 URGENT: Cursor Workshop Early Bird Ends TOMORROW! 🚨

Turn Failures into Fuel with Cursor - Save $49 NOW!

⏰ LAST CHANCE: Early bird pricing expires in less than 24 hours!

I'm genuinely excited to share this workshop with you. After months of pushing the boundaries with AI coding tools, I've distilled everything into a 6-hour intensive that will transform how you work with Cursor (and any AI coding assistant).

Why This Workshop is Different:

Real workflows, not tutorials: Learn the exact processes I use daily to build production apps
Failure-focused approach: Turn every AI mishap into a learning opportunity
Tool-agnostic principles: Master concepts that work across Cursor, Windsurf, Copilot, and beyond
Live debugging sessions: Watch me handle real errors and stuck agents in real-time

What You'll Master: ✅ Project generation that actually works (with proper structure and best practices) ✅ Context curation for massive codebases without token overload ✅ Agent behavior patterns - know when they're stuck BEFORE wasting credits ✅ Composer pipelines that automate your entire workflow ✅ Git + AI strategies for safe experimentation ✅ Multi-file refactoring without the chaos

Proven Results:

"Take John's cursor workshop it is 🔥 HIGHLY Recommend" —David Wells

"I used the skills you taught me to effectively one-shot OAuth issuer support in the Epic Stack. So cool!" —Kent C. Dodds

"John is the cursor god" —Sunil Pai

Workshop Details:

When: Friday, June 27, 2025, 9:00 AM - 2:00 PM (PDT)
Format: Live Zoom workshop with Q&A throughout
Includes: Recording access, workshop materials, and my personal Cursor rules

💰 Pricing:

~~Regular Price: $249~~
🎯 EARLY BIRD: $200 (Save $49!)
⚠️ Price increases TOMORROW at midnight!

Perfect for:

Developers frustrated with AI tools that "almost" work
Teams wanting to standardize their AI workflows
Anyone who's felt the pain of agents modifying the wrong files
Developers ready to 10x their productivity with AI

➡️ See Full Workshop Details

➡️ SECURE YOUR EARLY BIRD SPOT NOW →

P.S. - Team training packages available. Get your whole team up to speed with group rates!

P.P.S. - Seriously, don't wait. The $49 savings ends tomorrow and I'd hate for you to miss out. This workshop will pay for itself in saved time within the first week.

That's all for this week! Feel free to reply directly to this email with any questions or feedback.

Keep on shipping! 🚀

John Lindquist

egghead.io