AI Dev Essentials #25: Meta's New Model, Claude in Microsoft 365, and Chrome MCP

Hey Everyone 👋,

John Lindquist here with the 25th issue of AI Dev Essentials!

As always, it's been another packed news week in the AI dev space. The new Meta model is intriguing due to the method in which they're training it. The Google Chrome MCP is the most powerful MCP release to date. OpenAI keeps expanding infrastructure, investing ludicrous amounts of money. Research shows that 90% of developers are using some form of AI to write code, which is a pretty dang high number.

Personally, I've been experimenting with the Chrome DevTools MCP and I'm looking forward to adding that to my Claude Code workshop and replacing a lot of my Playwright MCP usage. I've also been diving deeper into the Claude Code SDK, which had a minor release but added a bunch of types that have been missing in the past, making it much easier to work with. If you're interested in my next workshop, you can sign up at the link below.

🎓 New egghead.io Lessons This Week

Claude Code Essentials Course Updates - View Course

New lessons on Claude Code hooks:

Type-Safe Hooks with Bun - Parse JSON payloads with TypeScript types
Prompt Rewriting - Create custom shortcuts like :plan to transform prompts
Live Data Injection - Build load() functions that fetch API data on demand
Template-Driven Solutions - Generate N variations with v(5) commands
Hook Guardrails - Block dangerous prompts with exit codes and stderr feedback

These lessons show how to extend Claude Code with custom automation, from simple prompt shortcuts to sophisticated guardrails. Perfect for teams looking to standardize their AI workflows.

🚀 Major Announcements

Meta Unveils Code World Model for Advanced Code Generation

Meta FAIR released Code World Model (CWM) on September 24, 2025, a 32-billion-parameter research model that introduces "world modeling" to understand how code execution affects program state, not just syntax.

Key capabilities:

World Modeling Innovation: Trained on Python execution traces and Docker environment interactions, recording how each line affects local variable states
Impressive Benchmarks: 68.6% on LiveCodeBench v5, 76% on AIME 2024, 65.8% on SWE-bench Verified, 96.6% on Math-500
Three Model Variants: Pretrained (cwm-pretrain), supervised fine-tuned (cwm-sft), and instruction-tuned (cwm) checkpoints
Neural Debugger: Functions like a neural pdb that can be set to any initial frame state for reasoning queries
Hardware Requirements: Needs 160GB combined GPU VRAM for inference
Research License: Released under FAIR Non-Commercial Research License for academic use

(Meta AI Research, GitHub Repository, MarkTechPost)

I actually really love this idea because I often feed build output, logs, and traces back into the AI to help it understand the code it wrote. Taking that approach when training the model seems like it might even save me that step and make it more intelligent about what the final output would be. The fact that this is only a version one and it's already so performant while being as small as it is is extremely impressive. It's definitely not something to rely on right away—it doesn't match the latest state-of-the-art models—but we'll see if it catches up.

Claude Now Available in Microsoft 365 Copilot

Microsoft announced on September 24, 2025, that Claude Sonnet 4 and Claude Opus 4.1 are now available in Microsoft 365 Copilot, expanding AI model choices for enterprise users beyond OpenAI.

Integration details:

Model Options: Claude Opus 4.1 powers Researcher agent for complex reasoning, Claude Sonnet 4 available in Copilot Studio
Enterprise Focus: Building agents with deep reasoning and workflow automation capabilities
Rollout Strategy: Available through Microsoft's Frontier Program to opted-in Copilot-licensed customers
Data Considerations: Claude models hosted outside Microsoft environments, subject to Anthropic's Terms of Service
Strategic Shift: Microsoft embracing multi-model orchestration rather than exclusive OpenAI reliance

(Microsoft 365 Blog, Anthropic Official, TechCrunch, CNBC)

I'm not going to pretend to understand the inner workings of the business contracts between all these powerhouses, but last I checked, Microsoft and OpenAI were pretty tight bed partners. So this announcement of Anthropic stepping in as an option really catches me off guard. Honestly, I prefer Anthropic's models for doing things like writing documents and authoring code, and it's probably the best play in the long run for Microsoft to be flexible enough to support all the different providers. But at a certain point you'd think Microsoft would just start providing its own models?

OpenAI Expands Stargate With Five New AI Data Centers

OpenAI, Oracle, and SoftBank announced five new Stargate AI data center sites on September 23, 2025, accelerating their $500 billion infrastructure commitment ahead of schedule.

Expansion details:

New Locations: Shackelford County TX, Doña Ana County NM, Lordstown OH, Milam County TX, plus one Midwest site
Investment Scale: $400+ billion over three years, approaching full $500 billion commitment
Capacity Growth: Nearly 7 gigawatts planned capacity, progressing toward 10-gigawatt goal
Partnership Structure: Oracle leading three sites, SoftBank managing two through SB Energy
Compute Offerings: New compute-intensive features for ChatGPT Pro subscribers with additional fees
Timeline: Ahead of schedule to meet full commitment by end of 2025

(OpenAI Official, TechCrunch, Bloomberg, CNBC)

Scale, scale, scale. It feels like all the major providers are having infrastructure measuring contests at this point. I know they all think that more equals smarter. Given the global impact of these investments in infrastructure and energy costs, the entire world better hope that's true.

🛠️ Developer Tooling Updates

Cursor Ships Custom Commands and GPT-5-Codex Support

Cursor version 1.6 introduced custom slash commands on September 12, 2025, allowing developers to create reusable prompts in .cursor/commands/[command].md files, plus integration with OpenAI's GPT-5-Codex model.

New features:

Custom Commands: Define reusable prompts in .cursor/commands/ directory, accessible via / in Agent input
GPT-5-Codex Integration: Access to OpenAI's latest coding-optimized model (announced September 23)
Command Examples: Light code reviews, test runners with auto-fix, PR comment resolution
Website Evolution: Positioning as "the new way to build software" beyond just an IDE
Developer Experience: Commands work with CLI integration for automated workflows

(Cursor Changelog, Cursor Documentation, Cursor Blog)

When I first used Cursor, I used to create a slash workflows directory and then @include the workflow file in the agent to essentially trigger that workflow. If you've been to any of my Cursor workshops, you saw how I set that up. Commands are essentially the evolution of having a workflow prompt you can keep in your project, share with the team, and trigger to take action. While I do love the idea of commands and the fact that they're coming to Cursor—they're available in many tools now, such as Claude Code and others—my favorite part is that everyone is standardizing on using the slash as a way to reference prompts that should take action. It's a great developer experience and I'm glad that everyone is on board with that now.

Chrome DevTools MCP Brings Browser Automation to AI Agents

Google launched Chrome DevTools MCP on September 23, 2025, enabling AI coding agents to control Chrome's debugging and automation capabilities through the Model Context Protocol.

MCP capabilities:

Automation Control: Programmatic handling of clicks, form fills, dialogs, and navigation
Performance Analysis: Record traces and extract optimization insights from web apps
Advanced Debugging: Analyze network requests, console messages, screenshots, script evaluation
Browser Emulation: Test with CPU slowdown, network throttling, various screen sizes
Modern App Support: Specifically designed for complex single-page applications
AI Integration: Works with Claude, Cursor, and any MCP-compatible tools

(Chrome Developers Blog, GitHub Repository)

Chrome exposing an MCP server is essentially AI web debugging. I know this is only a public preview, but as a sign of things to come, allowing any of the AI agents to check in with Chrome to see what's happening in browser windows is going to allow agents to do a lot more work and check their work—everything from debugging console logs and catching errors to inspecting layouts or traces to find performance issues. This is a huge first step towards allowing agents to better automate web development and I'm all here for it.

VS Code Introduces Auto Mode for Intelligent Model Selection

Microsoft released Auto Mode for VS Code on September 24, 2025, automatically selecting the optimal AI model based on task complexity and context.

Auto Mode features:

Smart Model Routing: Automatically chooses between different AI models based on query complexity
Context-Aware Selection: Considers file type, project context, and task requirements
Cost Optimization: Uses lighter models for simple tasks, powerful models for complex ones
Seamless Experience: No manual model switching required during development
Performance Balance: Optimizes for both speed and accuracy based on needs

(VS Code Blog, Microsoft Developer Blog)

I know that VS Code started out a little bit behind, but they sure are catching up. If you haven't had a chance to try VS Code in lieu of the other AI editors out there, I strongly recommend installing it and checking out their plans to see if it's a good fit for you. They're doing an amazing job at responding to community feedback, and you can see the passion they're putting into all the new updates.

Vercel Releases Open Source Coding Agent Template

Vercel launched an open-source coding agent template on September 23, 2025, powered by AI SDK, AI Gateway, and Sandbox, providing infrastructure for building custom coding agents.

Template features:

Multi-Agent Support: Works with Claude, Codex, Cursor, and opencode models
Isolated Sandboxes: Each task runs in its own secure environment
OpenAI-Compatible Gateway: Unified API for different model providers
Parallel Execution: Run multiple agent tasks simultaneously
GitHub Integration: Ready for CI/CD workflows and automation
Cloud Execution: Keep local machines safe from unwanted changes

(GitHub Repository, Vercel Blog)

It's fascinating to watch the wave of containerized agents that you can run from the browser or trigger via webhooks. OpenAI's Codex has done it for a long time, and there are many others out there. Typically people compare these to pull request reviews that just look through code changes for bugs or syntax issues. But these containerized approaches are much more powerful because they allow for code execution and inspection that just looking at the code can't achieve.

Seeing Vercel execute on this is particularly interesting because they own the stack that their apps and services are hosted on. They can containerize on a platform where they're already building the applications and respond to errors that happen in the container. This really takes it to that next step of AI agents having fixes for you when things go wrong. Because Vercel has the logs, build errors, history, deployments, and everything in place where they can reference so many aspects of your project, they have an incredible amount of context to leverage. I'm going to keep a close eye on this one as I think Vercel is in an awesome spot to really execute on coding agents as part of their infrastructure—not just as an add-on that does a simple code review on GitHub.

🤖 AI Ecosystem Updates

Google's 2025 DORA Report: 90% of Developers Now Using AI

Google released the 2025 DORA Report in August 2025, revealing that 90% of software professionals use AI tools, with complex impacts on productivity and delivery stability.

Key findings:

Adoption Rate: 90% of developers use AI (up from 76% in 2024), with 65% reporting "heavy" usage
Productivity Impact: 80%+ report increased productivity, 59% report improved code quality
Trust Gap: 30% have little to no trust in AI-generated code despite widespread adoption
Delivery Metrics: AI improves throughput (reversing 2024's negative trend) but increases instability
Time Investment: Developers spend median of 2 hours daily working with AI tools
Success Factors: Platform engineering and strong foundations critical for AI value realization

(Google Cloud Blog, Google Developers Blog, The Register)

The 30% trust gap is the most interesting finding here. We're all using AI, but a third of us don't trust what it produces. I'm happy to see devs are still using their brains instead of blindly trusting the output.

DeepSeek Releases V3.1-Terminus With Agent Improvements

DeepSeek launched V3.1-Terminus on September 22, 2025, addressing user feedback with improved language consistency and stronger agent performance.

Model improvements:

Language Consistency: Fixed Chinese/English mix-ups and random character generation
Code Agent Upgrades: Enhanced performance for coding tasks and tool use
Search Agent Enhancement: Better web search integration and result synthesis
Stability Improvements: More reliable outputs for production use
API Compatibility: Maintains compatibility with existing DeepSeek integrations

(DeepSeek Official, DeepSeek API)

Their code agent improvements are noticeable, though still not quite at Claude or GPT-5 level for complex tasks... Definitely worth keeping an eye on though.

OpenAI Study Reveals ChatGPT Usage Patterns

OpenAI released a large-scale study on September 22, 2025, analyzing how people use ChatGPT across personal and professional contexts.

Study highlights:

Broad Adoption: Consumer adoption expanded beyond early tech adopters
Value Creation: Significant economic value through both personal and professional use
Usage Patterns: Diverse applications from coding to creative writing to research
Professional Integration: Increasing use in workplace workflows and decision-making
Learning Applications: Heavy use for education and skill development

(OpenAI Research)

AI tools are becoming essential in every field of work. The shift from "early adopter toy" to "professional necessity" happened faster than I would have predicted. I still haven't quite got my parents to fully embrace AI, but I'm working on it!