[00:00] So I'll start recording my screen and I've already set up all these tabs that have the things that I want to talk about and brainstorm. First I'll talk about my goal and that's the fact that I know that presenting a human with an empty text box can be a bit intimidating. So it's always good to offer them some suggestions and some hand-holding and guided ways of filling out the form and filling out a text box. The end goal in my project is to be able to guide a user to creating automation on their desktop to clean up a repetitive task of theirs. So I want to replace this empty text box with a way to guide a user through creating a sentence or a paragraph describing the automation that they want to script away.
[00:39] So now that I've covered my goal, let's cover the inspirations. So Zapier has essentially a flow builder which starts with defining a trigger, the thing you want to kick off the workflow, and then the actions you want to happen, and then it can branch into various responses and follow-up actions from there. Now what I don't like about this is I don't want a visual canvas representing their workflow. I'd rather keep it text-based because the result of what we're building is a text prompt and then that prompt will be handed over to an AI to actually build out the script itself. So I like the concept of providing them a trigger where they can select what triggers it, providing them actions where they can select from a group of actions.
[01:21] I hate modals. I'd much prefer they be inline drop downs, but there's definitely a lot of similarity between this concept of Zappy or running these things on the web and my goal of being able to run automated workflows on the desktop. Another source of inspiration is RxJS and their decision tree. I love decision trees because it removes any sort of typing, it removes the decision paralysis that comes in by only presenting you three options, and once you select one it brings up what you can do from there. And then you can keep on clicking and clicking and then it finds the thing that you want based on all the decisions you made.
[01:53] Now this doesn't line up 100% with our goal because essentially the trigger would be a decision tree, the individual actions would be a decision tree, and we're looking for more of a fill-in-the-blank Madlib style of building up this prompt. Where I could definitely start with I want to and then provide a list of options and then that should and then provide a list of options but our end goal is English not some visualization. Now just as another source of inspiration I really like Google's WISC experiment where you can upload three different image styles. So I'll just randomize these and based on these three images it's inspired by each of them to create something that the user might want and they could swap out the scene, they could swap out the subject. And I think from a scripting perspective, they could swap out the trigger, swap out the API, swap out the goal, and then the AI, because we're working with AIs, could infer scenarios that could be automated away that the user might not even think of.
[02:47] So while there are a small group of users who will know exactly what they want to automate, I imagine there's a much larger number of users who don't know all the things that it's possible to automate. And so prompting them for the APIs they use most often, the work that they do, and those various things, and then show them results of scripts that could help them automate away those repetitive tasks I think would be a great source of inspiration. And then just sort of as the end result what I'm building is very much inspired by a lot of the AI music generators where you enter a prompt of text and then you would have all of your scripts, all of your automations, and then there would be like a library of community scripts and automations that have been generated and it becomes much easier to explore and search. And so there would be, as you can see in this, like a huge exploration of just clicking on something and re-rolling automations, re-rolling scripts into something different so that users could take bits and pieces from a community and build up a script. So there's that aspect as well where while a user will be prompted for these things, there's also a community of automations around it they'll be able to select from and they can pull inspiration from.
[03:53] And then lastly some things I don't like. I don't like the way make.com makes you click a giant button and then select from a giant list with huge UIs where again the idea is building up visual connections. In our scenario just English sentences are going to work much better in letting AI make the connections. And then most of our users are developers where they could tweak exactly what they wanted to do once they have the mostly automated thing. I do like the way Relay provides this kind of layered UI where there's a trigger, there's the service, there's some options in the service, and then the responses and actions.
[04:26] I know this isn't as popular as Zapier, but this is much closer to what I'd like to see. Even though theirs is mostly UI and I would like as users select options it ends in English sentences. And I'm definitely willing to be convinced that this is a better approach than what I'm thinking of as a fill-in-the-blank approach. And then obviously you have to call it series shortcuts. So here's a screenshot of a Siri shortcut where it has that list of essentially English connections where the automations are things you do throughout the day.
[04:54] There's an additional trigger and then step by step by step and the end result is very close to English. And I know that Siri has a concept of variables that can be shared between these little sections along the way and that might be important to what we're building. I just haven't built it yet so I haven't got a good feel for what that's supposed to be. So I'll end my recording there. I obviously chopped up a lot of it for the sake of you instead of watching me mumble through the whole thing.
[05:17] And I'm going to take the result of this, and this turned out to be a 12 minute video unedited, and I'm going to open AI Studio and drag and drop this onto AI Studio. Then it'll start extracting it to see how many tokens there are. Looks like something erred out with 500. I'm gonna pause my video and then go chop up my recording to make it smaller and compress and re-encode it. Maybe that'll help.
[05:37] Alright, so I chopped up my video. I usually don't have to do that, but now it's down to 230 megs and I think about five minutes. Alright, so this weighed in at about a hundred thousand tokens and will tell it. I recorded this video as a brainstorming session about a feature that I want to build. Please write an extremely detailed summary of this video explicitly calling out all of my opinions and the UIs I was talking about.
[06:00] Include all of the details of the things I like and the things I don't like, and organize it in such a way that I can use it to create a plan for building out the prototype of this feature. We'll paste that in, we'll let it run, and then after 30 seconds we have an extremely valuable document that we can use, especially when planning with UIs in the future about how can we break down this feature into steps, what we work on first, what will be the easiest things to build, what will be the most difficult things to build. And it all came from organizing a few sites and then just jabbering on about what I liked and what I didn't like, then dropping that in here and the output of this becomes essentially a spec, planning document, whatever language you want to use for how we're gonna build this in the future. And because I was recording I probably used a bit terser language, a bit more thoughtful language as I was going through this video, and I definitely recommend as you're doing your own screen recording to just jabber and jabber and jabber. Just talk about everything that comes to mind, dump everything out of your brain so everything is captured, and that way you pull out as much nuance and expectation, especially when going over sites and features that you don't like, to avoid a bunch of roadblocks and things that would come up in the future by simply blabbering in the present.