AI Dev Essentials #9: Playwright MCP is amazing, Hot Model Updates & Essential Dev Tools

John Lindquist
Instructor

John Lindquist

Hey Everyone 👋,

John Lindquist here with the ninth issue of AI Dev Essentials! This past week I've been exploring the capabilities of the Playwright MCP and the ability to inspect pages, logs and information from the browser from Cursor. And it's been incredible how many new and amazing workflows this unlocks. The Playwright team has done an awesome job where the MCP feels predictable and consistent, which is incredibly difficult to pull off in the realm of MCP, so I definitely recommend it if you're looking to build an MCP to check out their GitHub repo as a good starting point.

Other than that, I've been experimenting with a lot of Cursor rules and workflows, and have been a bit more lenient with my budget as far as allowing agents to make their own choices. But I keep on coming back to the fact that if you start any task without a plan, then you're ultimately screwed. The first step to getting anything done in Cursor or any AI agent is to create a plan. With that being said, let's dive into this week's essentials!

New egghead.io Lessons This Week

Automated Web Form Testing & Bug Fixing with Playwright MCP in Cursor(egghead.io)
Discover a powerful workflow using Cursor's AI agent and the Playwright MCP to instrument your web form with exhaustive logging, delegate automated testing to the AI to uncover bugs and edge cases by analyzing console logs, and then use the AI-generated report to fix the underlying code.

Autofix Browser Errors with the Playwright MCP in Cursor(egghead.io)
Learn how to create a powerful, automated debugging loop using Playwright MCP integrated into Cursor IDE via custom Rules. The AI interacts with your live application, identifies errors, and iteratively fixes them by launching Playwright, navigating the browser, and applying code changes until all errors are resolved.

Local AI Code Reviews with the CodeRabbit Extension in Cursor(egghead.io)
Learn how to integrate the CodeRabbit extension into your Cursor workflow for an extra set of AI eyes on your code changes. See how to initiate reviews, analyze suggestions, apply fixes directly, or use AI to further refine CodeRabbit's feedback with more context.

🚀 Model & Platform Updates

🔥 The Big Three Race

The Watchlist Heats Up: o3 Pro, Grok 3.5, Gemini 2.5 Pro (Full) anticipated soon

The hype machines have been running for a while, as these are all significant releases for each of the major companies. Sometimes I wonder if they're just waiting to see who's going to release first.

I'm still anxiously anticipating o3 Pro. o3 is still my go-to model for the most difficult questions when I really need reasoning. It's proven time and again to me that it can find solutions to edge cases that other models haven't been able to. And a pro version of the model to me is extremely enticing.

🧠 Research & Breakthroughs

  • Meta Study: Shorter Reasoning Can Boost AI Accuracy by 34%

    An interesting study emerging from Meta suggests that "less is more" when it comes to AI reasoning. Their findings indicate that shorter, more concise reasoning pathways can lead to a notable improvement in AI accuracy, potentially up to 34%. This could influence future prompt engineering strategies. (Read on VentureBeat(venturebeat.com), Reddit Discussion(reddit.com))

    This is interesting how much this aligns with humans and their going with gut instinct or first idea and how often that can be the best way. Because sometimes the longer a human thinks about the problem, the more they dig themselves into a hole. And as AIs and the models evolve and get smarter, determining which one is correct is going to be fascinating from a thinking perspective.

  • The Darwin Godel Machine: AI That Rewrites Itself

    A fascinating research paper introduces the "Darwin Godel Machine," an AI system designed with the capability to improve itself by rewriting its own code. This points towards a future of self-evolving AI systems. (Read the Paper: arXiv:2505.22954(arxiv.org), Reddit Discussion(reddit.com))

    This is the true sci-fi stuff. Anytime we talk about AIs improving themselves and kicking off the idea of a singularity. Whether or not these articles are true, it tickles a sci-fi itch that I have when anyone talks about AI. I consider myself much more practical and grounded in the ways that I can find utility in AI today. But I'm still a sci-fi nerd at heart.

🔧 API & Feature Updates

  • Gemini API Offers Deeper Insights with "Thinking Summaries"

    Google is making its Gemini API more transparent. It now supports "thinking summaries" and "thought summaries," allowing developers to get a clearer picture of the model's reasoning process. (Details from Logan Kilpatrick on X(x.com), Patrick Loeber's Tip on X(x.com), Official Docs(ai.google.dev))

    I really love the two-phase approach of thinking and then executing. It just aligns so well with how I always create a plan and then execute. So having that initial plan and those thoughts that you can pull out of the thinking and you can collect why it came up with that answer is just so important when you get to the final answer to evaluate your initial query because you can actually see what was going on under the hood. It's just extremely valuable information and I hope that all reasoning models expose ways of capturing thinking.

  • Perplexity Explores "Deep Research" with Opus 4 & Memory

    Perplexity is reportedly developing a "Deep Research" mode. This new feature is rumored to incorporate Claude 3 Opus 4 and enhanced memory capabilities, aiming to provide more comprehensive and context-aware answers. (Full Scoop via TestingCatalog(testingcatalog.com))

    Perplexity has always been a bit perplexing to me, pun intended, trying to outplay the bigger providers at their own game and with their own models when the bigger providers can just copy and paste every feature that perplexity creates. Perplexity has always been the best bang for your buck and it has a great experience. And it's usually what I recommend for people who are brand new to AI because they don't have to think about whether or not they want to turn on search. It just kind of auto uses which models they think are best for it. But from a professional perspective and a developer's perspective, I always lean on the tools where I feel like I have much more control. But I know my family uses perplexity and I always keep an eye out on the workflows that they take advantage of or that they don't even know about.

🌟 DeepSeek-R1-0528 Release: Enhanced Capabilities

The DeepSeek team has announced the release of DeepSeek-R1-0528, bringing several key improvements to their reasoning model.

The latest version boasts:

  • Improved Benchmark Performance: Demonstrating enhanced results across various industry benchmarks.
  • Enhanced Front-End Capabilities: Offering better performance and features specifically for front-end development tasks.
  • Reduced Hallucinations: Focused efforts have been made to increase the factual reliability of the model's outputs.
  • JSON Output & Function Calling: Now with native support for structured JSON output and more robust function calling.

You can try out DeepSeek-R1-0528 at chat.deepseek.com(chat.deepseek.com). For developers, the API usage remains consistent (refer to the API documentation(api-docs.deepseek.com)), and the open-source weights are accessible on Hugging Face(huggingface.co). The official announcement also features a benchmark performance image and an example GIF showcasing the model's new capabilities. (via DeepSeek API Docs News(api-docs.deepseek.com))

Many people were disappointed by this news because they were expecting an R2 model, which would be a major leap forward, whereas this seems like a small step forward. But any step forward in the open source space where people can run models locally is extremely welcome. It's so easy to forget that all the tools we have and providers we rely on today will eventually become freely available on our local machines or much cheaper to run. And so the pressure that any model can put on the bigger providers is worth watching and evaluating.

🛠️ Developer Tooling & Ecosystem

🎨 Multimedia & Generative AI

  • Chat-Based Image Editing with Replicate's Kontext Chat

    Replicate has launched Kontext Chat, a tool that allows users to edit images using natural language commands. Built with Hono and running on Cloudflare Workers, this open-source project showcases the power of conversational interfaces for creative tasks. (Try Kontext Chat(kontext-chat.replicate.dev), GitHub Repo(github.com), Official Blog Post(replicate.com), Announcement on X(x.com))

    After trying this out for a while, this is honestly the best image editing tool based on prompts where you just pass in your own personal images and tweak them a little bit. Definitely super fun to try out with family and friends and your own personal images.

  • ElevenLabs Elevates Conversational AI to Version 2.0

    ElevenLabs has rolled out a significant update to its Conversational AI. Version 2.0 features real-time turn-taking for more natural dialogue, automatic language detection, built-in Retrieval-Augmented Generation (RAG) for accessing live information, multimodal capabilities, scalable batch calling for enterprise use, and robust privacy features. (Official Announcement: ElevenLabs Blog(elevenlabs.io))

    ElevenLabs has been the leader in conversational AI from a motion and tone aspect for a long time. So their version 2.0 is a huge step forward. I've had nothing but pleasant experiences working with them. So if you're looking for a place to launch an audio first AI product, it's a great place to start.

  • Chatterbox by ResembleAI: An Open-Source Voice AI Contender

    ResembleAI has open-sourced Chatterbox, a compelling alternative for AI audio generation and voice cloning. It boasts zero-shot voice cloning from just 5 seconds of audio and unique emotion intensity control. (Official Page: Resemble AI(resemble.ai), Hugging Face Space(huggingface.co), GitHub Repo(github.com))

    While I really enjoy ElevenLabs, I like open source even more. It's so great to see open source entering and even exceeding the capabilities of existing models. I've only had a chance to briefly play with this model, but it's been nothing short of insanely impressive.

💻 Local & On-Device AI

  • Ollama v0.8 Introduces Streaming Tool Calling

    Ollama continues to enhance its local LLM capabilities. The latest version, 0.8, now supports streaming responses with tool calling. They've demonstrated this with an example of Ollama performing a web search. (Official Blog Post(ollama.com), GitHub Release(github.com), Announcement on X(x.com))

    Ollama is my absolute favorite tool for running AI locally on my Mac. I really need to set some time aside to create some background automations with tool calling.

  • Google AI Edge Gallery: Explore On-Device Generative AI

    Google has launched the AI Edge Gallery, an experimental app for Android (iOS coming soon) that allows users to explore and evaluate generative AI models running entirely on-device. Features include model selection, image-based Q&A, a prompt lab, AI chat, performance insights, and the ability to test your own LiteRT .task models. (GitHub Repo & APK Download(github.com), Project Wiki(github.com))

    Google is obviously all-in on AI. I'm super curious what their next flagship phone will be capable of and I think there are some good hints in this repo.

🔨 Developer Tools & IDEs

  • Chrome DevTools + Gemini: Modify & Save CSS Directly

    An exciting update for web developers: Chrome DevTools now integrates with Gemini to allow you to modify CSS and save those changes directly back to your source files when using a connected workspace. (Official Announcement: Chrome Developers Blog(developer.chrome.com))

    Chrome DevTools is crushing it with AI features in their dev tools. I know we've all been there when we've made a change in CSS to a style and we run into that friction of now we got to go update it in our project. And I just absolutely love the effort that they're making here to make the Chrome environment something where you can just tweak your changes and then those changes are automatically put back into your local codebase.

  • Biome VS Code Extension Reaches v3

    The Biome linter/formatter's VS Code extension has hit version 3. This update brings support for multi-root workspaces, a single-file mode, handling of unsaved files, and automatic reloading after Biome updates or configuration changes. (Read More on Biome Blog(biomejs.dev), Announcement on X(x.com))

    Biome is by far my preferred linter and formatter, and I love that it can just keep an entire codebase consistent and clean while maintaining lightning speed. I hate burning cycles of AI when it's fixing silly linter errors when Biome can just go in and apply some massive autofixes and clean it up itself. So definitely check out Biome as a replacement for any of the linting or formatting tools you might be using.

  • OpenAI Codex Rewrite in Rust

    OpenAI is refactoring their Codex CLI tool to Rust. (Read more on GitHub Discussions(github.com), Announcement on X(x.com))

    To me this is noteworthy because a lot of people I know are curious about writing CLI tools in Rust. So this will be a good project to follow since it's now open source and you can just see what sort of patterns and designs they'll be using. And I'll definitely be copying some of them myself and using this project as a reference.

🤖 AI Agents & Assistants

📝 Agent Tips & Best Practices

  • Tips for Working with the Jules AI Agent

    The team behind the Jules AI agent has shared a series of practical tips to help users achieve cleaner and more effective results when delegating tasks. (See the tips on X(x.com))

    Jules has had kind of an interesting launch to me because sometimes it just takes so long that even though it's free, it hasn't been worth coming back to or using. Or sometimes the service has been down. So I know there's a lot of hype around it and I appreciate their efforts and I hope it just gets better and more reliable. It just hasn't been there from a reliability standpoint for me. But there's so much to learn in threads like this that is applicable to every other agent. It's definitely worth checking out this thread.

🌐 Web Agents: The Quest for the All-in-One Multipurpose Agent

The AI landscape is witnessing a fascinating race to build the ultimate all-in-one multipurpose agent—systems that promise to control everything from maps and calendars to social media, travel planning, and beyond. While the vision is compelling, the execution remains a mixed bag. Here are a few notable players in this space that are worth keeping an eye on:

  • Genspark Super Agent: The Comprehensive AI Platform

    Genspark has emerged as a serious contender in the general AI agent space with their "Super Agent" system. Built on a "Mixture-of-Agents" architecture using multiple LLMs and over 80 specialized tools, it tackles complex tasks like comprehensive travel planning (including making actual phone calls for reservations), content creation, deep research, and website generation. The platform reportedly scores 87.8% on the GAIA benchmark and offers transparent reasoning visualization so users can see how the agent thinks through problems. (Try Genspark(genspark.ai), Learn more about Super Agent(medium.com))

  • Flowith: The Infinite AI Creation Workspace

    Flowith positions itself as "The World's First Infinite AI Agent" with their Agent Neo, featuring infinite steps, infinite context, and infinite tools. Their approach centers around a multithread canvas interface that allows for non-linear AI interactions, autonomous planning without prompt engineering, and unlimited tool selection. The platform emphasizes visual organization and real-time collaboration, transforming traditional chat-based AI interactions into a more dynamic, canvas-based workflow. (Try Flowith(try.flowith.io))

  • Fairies AI: A General-Purpose AI Agent

    Robert Yang introduced Fairies, an AI agent designed for a wide range of tasks. It claims capabilities across thousands of actions in various applications, full file access, code generation, and deep research, with a multi-agent architecture. (Try Fairies.ai(fairies.ai), Announcement on X by Creator(x.com))

    The story here is essentially everyone trying to build the all-in-one multipurpose agent that can control everything. While I appreciate the ambition, this hasn't been a workflow I've been super interested in since I'm much more someone who builds my own targeted scripts and workflows. I'm just not sure how much I'd trust these things to fully execute a vision or automation that I'd have. But it's worth keeping an eye on these developments because who knows—maybe one of them will take off and conquer the AI landscape. The race is definitely heating up, and the capabilities are becoming increasingly impressive.

🖥️ Desktop Agents: Capturing and Understanding Your Digital Footprint

The concept of desktop AI agents that capture and understand user activity was significantly popularized by tools like Rewind AI, which focused on recording everything on a user's screen to create a searchable history of their work. This sparked a new category of tools aimed at enhancing productivity and knowledge management by observing and assisting users directly within their desktop environment. Here's a brief overview of some tools in this evolving space:

  • Rewind AI:

    Pioneered the idea of continuous screen recording and indexing for personal recall, though its cloud-based approach and lack of developer-specific features have led users to seek alternatives.

  • Pieces for Developers:

    Focuses on capturing context-rich code snippets and associated knowledge locally across developer tools like IDEs, browsers, and terminals, emphasizing privacy and code-awareness. (Pieces Website(pieces.app))

  • Microsoft Recall:

    Offers a Windows-native AI memory system that continuously records screen activity and indexes it, strong in enterprise settings but lacks deep developer context and Linux support. (Microsoft Recall Docs(learn.microsoft.com))

  • Screenpipe:

    Provides a privacy-first, open-source visual memory tool that records screen activity locally, akin to a more ethical Rewind AI, but without code-parsing capabilities. (Screenpipe Website(screenpi.pe))

  • Fabric Internet OS:

    Takes an OS-style approach to connect notes, documents, and snippets, focusing on user-curated inputs rather than passive screen recording to map knowledge. (Fabric Website(fabric.so))

  • CodeStory:

    Tracks the evolution of code snippets with Git integration, allowing developers to attach reasoning and context to revisions, focusing on intentional saves rather than screen recording. (CodeStory Website(codestory.ai))

🎯 Framework & Component Updates

🚀 Remix Wakes Up!

The Remix team has announced a significant new direction for the framework with Remix v3, signaling a shift towards an AI-first approach. After merging Remix v2's capabilities into React Router v7, the team is now free to reimagine Remix as a modular toolkit prioritizing simplicity, clarity, and performance. A core principle for this new version is "Model-First Development," meaning the framework's source code, documentation, tooling, and abstractions will be optimized for Large Language Models (LLMs). Furthermore, Remix v3 aims to provide abstractions for applications to integrate AI models directly into their products. This new iteration will also focus on owning the full stack by minimizing dependencies—not even relying on React and instead starting with a fork of Preact—and building extensively on Web APIs. The goal is to create a lighter, faster development experience that's more aligned with the web's fundamental workings. (Read on Remix Blog(remix.run), Announcement on X(x.com))

I'm so excited for this. I love that the Remix team is taking the risks and that they're willing to push the envelope when so many frameworks just kind of settled for the current way of doing things. I know that it's mostly just in the idea phase, but if you know me, you know that I'm totally in line with people who think that all software, all tooling, and all features should be thought of from how will these be presented in a world driven in an AI first future.

🧩 UI Components

  • Smoother LLM Streaming with llm-ui.com's Markdown Component

    Rafal Wilinski has found a Markdown component at llm-ui.com(llm-ui.com) designed to address the "jankiness" often seen with streaming Large Language Model responses, aiming for a smoother user experience. (via Rafal Wilinski on X(x.com))

    This is just one of those little pain points that if you've ever built streaming AI text components, you know how difficult it is to properly render markdown while it's streaming. So if you've ever experienced that, check this out. It's awesome.

💡 Cursor Corner

📈 Growth & Adoption

  • Cursor's Staggering Growth: 1MM to 300MM ARR in 25 Months

    A post on Reddit highlighted Cursor's incredible financial trajectory, reportedly growing its Annual Recurring Revenue from $1 million to $300 million in just over two years. This underscores the massive demand for AI-native coding tools. (See the image on Reddit(i.redd.it), Reddit Post(reddit.com))

    Cursor just keeps on capturing market share. It's mind-boggling to watch those numbers go up.

🛠️ Tips & Workflows

  • Automate Cursor Ruleset Generation with Claude Code Prompt

    A user on the r/cursor subreddit shared a detailed and powerful prompt for Claude Code designed to automatically generate a complete Cursor ruleset for any given project. It analyzes the project's stack, conventions, and even business domains to create structured .mdc rule files. (Read the prompt on Reddit(reddit.com))

    A nice little rule for generating a "project" rule. Claude Code is the one CLI I've still yet to try, but I'm definitely going to run this very soon.

  • Mastering Context with Cline: /newtask & /smol Deep Dive

    Cline shared a guide to their slash commands, particularly /newtask and /smol. These commands are designed to help developers manage their context window efficiently, package work for handoff, or compress conversations to save on token costs and maintain focus. (Learn more in Cline Docs(docs.cline.bot), Tweet Thread on X(x.com))

    Cline is doing a fantastic job of establishing patterns/practices around AI workflows in editors. I'm super curious to see if Cursor will begin to adopt standardized workflows as well.

✨ Workshop Spotlight: Conquer the Complexity of Cursor ✨

🌍 Europe Friendly Timezone!

Ready to master practical AI development workflows in Cursor? Join me for this hands-on workshop! I've been teaching these sessions for months, refining the content, and I'm excited to share my latest insights on Agents, Ask mode, Custom Modes, multi-file analysis, effective prompting, Cursor rules, and handling AI failures. Let's conquer the complexity together!

When: Thursday, June 05, 2025, 5:00 AM - 10:00 AM (PDT) / 1:00 PM - 6:00 PM (UTC+1)

Where: Zoom (Live Q&A included)

Investment: $249

Read More(egghead.io) | Register Now ($249)(buy.stripe.com)

(Team training also available)

What are you excited about? I'd love to hear any news that you've come across. If you have any feedback or questions, hit reply. I'm happy to chat about the latest in AI Dev Tools.

John Lindquist
egghead.io(egghead.io)

Share with a coworker