AI-Driven Design Workflow: Playwright MCP Screenshots, Visual Diffs, and Cursor Rules

Establish an automated, AI-driven workflow for implementing UI designs by comparing live development versions against target screenshots. This lesson integrates the Playwright MCP server for browser automation, a custom script using Pixelmatch for visual diffing, and Cursor Rules for persistent context. See how to instruct the Cursor Agent to iteratively implement design changes based on visual diff feedback, creating a robust loop where the AI refines the UI towards the target design.

Workflow demonstrated in this lesson:

  • Configure the Playwright MCP server within Cursor for browser control.
  • Use the Agent and Playwright MCP tools (browser_navigate, browser_resize, browser_take_screenshot) to capture baseline and current state PNG screenshots.
  • Install pixelmatch and pngjs dependencies.
  • Guide the Agent to create and run a TypeScript script (compare-images.ts) to generate visual diff images between screenshots.
  • Establish Cursor Rules (visual-testing.mdc, project-setup.mdc, etc.) to provide the Agent with necessary context about directories, scripts, dependencies, and the overall workflow.
  • Initiate an AI-driven design implementation loop:
    • Agent captures the current state via Playwright.
    • Agent runs the comparison script to get a visual diff.
    • Agent analyzes the diff and implements code changes in the project (e.g., app/page.tsx, components, globals.css).
    • Repeat the capture-compare-implement cycle, using the latest diff as feedback.
  • Use context files and prompts like "Please continue" to manage the multi-step process and guide the AI through potential interruptions.

Key benefits:

  • Creates an iterative, AI-driven workflow for implementing designs based on visual comparisons.
  • Combines browser automation (Playwright) and pixel-level diffing (Pixelmatch) within the Cursor environment.
  • Utilizes Cursor Rules to effectively manage context and guide the AI through complex, multi-step tasks.
  • Demonstrates how visual diffs can serve as direct feedback for AI-driven UI code generation and refinement.
Share with a coworker

Transcript

[00:00] Install the Playwright MCP globally by copying and pasting the config into your cursor MCP add new global server and just paste it right here. Then make sure in your settings that Playwright turns green. If for any reason that didn't work, Make sure to install Playwright globally first, then click refresh, and that should sort it out. To test that this is working, we can come into our agent. We'll say, use Playwright to take a screenshot of egghead.io.

[00:26] Make sure to set the browser to 1024 by 768, and then save the screenshot inside of a screenshots directory in the root of this project. Paste that in, let it run, and you'll see it will launch a browser, resize it to the right size, create the screenshots directory, and then take a screenshot. So if we navigate in our project to our screenshots folder it looks like that didn't quite save it correctly. If you look at the MCP tool you'll see that it saved the file over here. So let's just clarify a bit and say please move that screenshot into my screenshots folder for me and name it egghead.io with the timestamp then that should fix that up.

[01:04] You can see it generated this and you'll see it generated this terminal command and it ran it and we now have our screenshot inside of our project. Now I'm going to come into our terminal with ctrl backtick. I'm going to spin up our dev server and then tell it to please take a screenshot of localhost on port 3000 and add that into our screenshots folder using the same process and same dimensions. We'll let this run. You'll see it's calling of our MCP tools and we now have a screenshot of our dev server.

[01:32] And now we're going to install a library that lets us do visual diffs. I'll paste in this link to Pixelmatch. I'm actually going to click on this link and unlink it since if you leave it linked it would read in the page. And I'm only pasting this in for specificity. Then I can say please install this library into my project and any related libraries required for creating visual diffs.

[01:54] Hit enter and then to move this forward please create a scripts folder and a diffs folder and in our scripts folder create a TypeScript script which will run pixel match against two screenshots and then output the visual diff into our diffs folder with a timestamp on it. Then let this run. Now we'll do a test run by saying please run that script using TSX to compare the current two screenshots in our screenshot directory. Looks like some types are missing so we'd install those. And it looks like by default Playwright used APEGs and we need pings.

[02:30] It calls us out on it so we can say yes please remove all of our current screenshots and use PNGs from here on out. And also remember to keep the dimensions consistent we're currently using 1024 by 768. While this is running the Playwright Chrome browser is doing actions behind the scenes which we can't see. I just have it in the background off-screen for now. Alright so now that we have a successful comparison and we look in our diffs folder and see this diff, let's go ahead and run our slash generate rules command based on this current conversation.

[03:04] And we'll just hit enter here and let it read through the conversation and hopefully it should pick up our dimensions and PNG and timestamps and everything we talked about in this conversation. So the result of that was in our cursor folder we now have visual testing and we have all the key information, the dimensions, the format, the naming conventions, how to compare them, and how to generate screenshots. Then it also added some project rules around using PNPM and the libraries that we installed during this conversation. So I'll close these out and start a new conversation with Command-N. I'll press Backspace just to remove this from the context and then add in both of our cursor rules since these currently aren't set to always apply.

[03:48] So I'll grab cursor rules here and there and then I'll add into our files and folders, find our app, and before I do anything else at least stage the changes that we've made so far. We could get ignore the images but for now we'll just keep them. So now that when we start this longer task we can always revert back to the previous state. So now we can say the goal is to copy the design of the egghead.io website and reimplement it in our own project. So please read in the egghead screenshot from our screenshots directory, implement the design into our codebase, and once you've made some progress please take a screenshot of our dev server to create a diff and then read in that diff to see what else needs to change.

[04:32] Keep on looping through these steps of taking a screenshot of our dev server, implementing changes into our app folder, and then reading in the diff until our designs match as much as possible. Then we'll just sit back and go ahead and let this run. Now Gemini 2.5 Pro will stop along the way. I do have a prompt that I copy and paste in often that essentially says, please continue your task is to follow these instructions to completion without asking for my input. Try, fail, learn, iterate, And then just essentially telling it to stop asking for my help, do your best, and iterate.

[05:07] While it's working we can check on these diffs to see its progress. You can see it has made a lot of progress. The initial one had Next.js here and now that's gone. We can check on the next diff as this works through. We can always swap over to our main dev server, see what it actually looks like.

[05:21] So right now everything is purple and if you compare to actual egghead.io you can see some of the progress it's making. So it hasn't made a ton of progress. So let's go back to our agent. Gemini 2.5 paused again so I'll paste in that same prompt. And it paused again so I'll paste in the prompt.

[05:38] Now it looks like it's moved on from creating the header over to course cards. So we should see some good progress now, at least from a macro perspective. Looks like the AI is going crazy for a second so we'll go ahead and stop that. Let's check on our progress in our diffs, check on the progress in the actual dev site. Looks like we're getting much closer here.

[05:57] I'm going to go ahead and start a new conversation since the AI seems to have gone into panic mode, and just use my Please Continue prompt. Now one thing I definitely would try here is asking the AI to set up a smoother workflow that could be done in a single script, probably extracting the MCP actions into our main script itself to save on a lot of AI calls here. Each time you use an MCP it is making a tool call, and that is going against your your budget, and it is making things slow down a bit. So once you get some of these workflows set up that rely on MCPs, I'd strongly recommend asking the AI to say, hey please look at all the steps we took. Can we turn this into a single script?

[06:40] And then you can rely less on loosely worded English phrases and tighten it into proper scripts and actions and endpoints and things that could reliably run in tighter feedback loops. Because this video is heavily cropped and this is actually taking 10 or 15 minutes so far, and while you can definitely have it run in the background, Because AI is so great at generating code, it will always be asking it to optimize and improve its own workflows. Looks like it paused again, we'll tell it to continue. And it's also kind of fun just to let this other browser window be open on a second monitor and just see a website being built by itself. It does look like it forgot to resize the browser.

[07:21] I'm going to scroll back and look at a recent MCP call. So it is doing a browser snapshot here. I don't see the resize tool called anywhere. So I'm going to remind it of that once it pauses again. Okay so it paused again and the one thing I absolutely forgot to do is at the beginning of my conversation I forgot to add in the rules.

[07:43] So let's add those back in. So cursor rules for project and visual testing. It's probably even worth unfortunately taking those two rules and reverting all the way back to the beginning of this conversation since it hasn't got that far. So we'll go up here then reference our project and our visual and then the past chat is still in context so it knows what we are doing. Then allow it to continue and we're just going to continue without reverting.

[08:11] I'm not too worried about the component design changes so far. It's starting to look for SVG. I'm just going to instruct it whenever you run into SVGs let's just use the text SVG and we can manually add them in later. I'm mostly adding this in because I don't want an AI to manually spit out an entire path of an SVG. It's just kind of wasted effort when you're going to have to go and completely replace it later anyway.

[08:36] We could ask it to install some SVG libraries or icon libraries that may have some of these logos and icons, but that's currently not worth the effort. And it's definitely something worth capturing in the generate cursor rules call at the end of this conversation. Gemini 2.5 is definitely having a finicky morning and we'll tell it to continue again. This is definitely happening way more often than I'm used to, but that's currently the nature of AI tooling. We can check on our current diff and it looks like there may be bigger issues at play here.

[09:04] So I'm going to go ahead and stop it there. If we look at how far it got and compare it to egghead.io proper, you'll see it honestly didn't do that great of a job, which is a little surprising to me. But also in fairness it really didn't do that much work with our feedback cycle and iteration cycle going so slowly. If you look at the changes in Git other than all the screenshots, our unstaged changes are the page with some data, a course card, a hero search, and a header. And I'd be curious with the CSS why it was picking purple and some other choices it made.

[09:45] So the key here is to actually learn from the failure. I think where I made my biggest mistakes, if I were to start over, I'd come into chat and create a new chat and reference the egghead screenshot. And I would focus on the theme first. I would say, please extract all of the colors from this image and add them to our theme. Feel free to overwrite anything, our theme is currently totally broken.

[10:07] So I could start working on the colors and kind of the core primitives of what design is first. Currently this is just one of those mornings where the model is behaving very poorly. So my frustration is definitely rising. I think we've all been there. Then after I extracted the theme I would have it describe, please describe the main layout of the site from the image.

[10:29] So we have a deep understanding of the structure of the page and let that generate. All right something really must be going on with this model it is just failing over and over. This is pushed past my frustration limit I'm gonna switch models to let's do clod 3.7 see what sort of craziness happens then just ask it to continue from here. Gemini 2.5 is having a really rough morning and I'm about an hour into this video and I expected way more progress. Please make a comprehensive list of the visible components and then we can generate cursor rules from this conversation.

[11:01] So essentially we're doing a complete analysis of this image, setting up some rules around it, so that moving forward it's much more grounded in a text description of the website itself. So let's create a new conversation from here. We can reference our app, we can reference all the rules we set up, layout patterns, project structure. In fact I'm going to come over here and simply select all of these and drag them over because I'm lazy. And then I'm going to try iterate on the design of the site using playwright, because based on the rules we've set up it should have enough information about what we want to do that an instruction that small should kick off our loop again.

[11:44] And this time I'll try it with 3.7 saw on it and see what happens. We do need to tell it our dev server is already running. Alright well it looks like all the models are just having issues and while this has been a very frustrating lesson to record and it didn't make a ton of progress, The goal was more to teach you the tools available and the workflows available than to actually have a working side at the end. So I feel like the things talked about in here are valuable enough to justify publishing this lesson. I hope you enjoyed watching it because I certainly didn't enjoy making it.

[12:15] I didn't enjoy making it.