illustration for Advanced Context Engineering with Claude Code

Course

Advanced Context Engineering with Claude Code

Run Gomplate Prompt Templating against gpt-oss:20b Locally with Ollama

John Lindquist

InstructorJohn Lindquist

View code on GitHub

AI workflows are often tied to specific APIs, making it hard to switch between powerful cloud models (like Claude) and free, private local models (like gpt-oss via Ollama). This lesson demonstrates a powerful and flexible command-line workflow that gives you the best of both worlds.

By combining gomplate for prompt templating with the power of shell pipes, you can direct the same prompt to different AI backends. This allows you to choose the right model for the job—whether you need the raw power of a cloud API or the privacy and cost-savings of a local model—all without changing your core workflow.

The Workflow

This lesson shows you how to:

Use gomplate to render a prompt from a template file.
Pipe the rendered prompt directly to a local model running with ollama run <model-name>.
Easily switch the pipeline to send the same prompt to the claude CLI instead.
Use flags like --hidethinking with Ollama for cleaner output.
Redirect the final AI-generated response into a markdown file for later use.

Key Benefits

Model Agnostic: Seamlessly switch between cloud and local models.
Cost-Effective: Use free, local models for development and experimentation.
Privacy: Keep sensitive prompts and data on your own machine with local models.
Automation: Easily integrate AI generation into your existing shell scripts and workflows.

Commands Used

Pipe a templated prompt to a local Ollama model and stream the output (including the "thinking" process).

gomplate -f prompt.txt | ollama run gpt-oss:20b

Pipe a templated prompt to Ollama, hide the thinking process, and redirect the final output to a file.

gomplate -f prompt.txt | ollama run --hidethinking gpt-oss:20b > next-steps.md

Remove the generated file.

rm next-steps.md

Pipe the same templated prompt to the Claude CLI and redirect the output to a file.

gomplate -f prompt.txt | claude -p > next-steps.md

[00:00] As an important side note, if you have Olama installed, just go to their site and download it. You can run our command and then you can pipe that to olama run. And once you pick a model, like the recently released GPT-OSS20B, everything we put together will be piped through that model instead of Cloud Code. You can see this one has thinking enabled, so you'll see it generate the thinking steps and then generate the actual response. And you'll see it still followed the tone, it followed the steps, and did everything expected of it.

[00:33] And you can disable thinking by passing a hide thinking flag. And we'll just go ahead and redirect this to a nextsteps.markdown file, let that run in the background. And now if we open the next steps file, you'll see we have our plan for what to do next. And again, this works the exact same way with Cloud Code. And if we remove the next steps, then run our command to Cloud instead of our local model running with Alama.

[01:03] We can let this run and it'll start piping out all the output of Clawed into next steps. Once this is done you'll see the content appear, but due to OpenAI releasing such smart models, which are apparently equivalent to 03, which is roughly in the ballpark of some of the Cloud models. It's debatable based on your machine and your requirements. If you want to spend tokens from your Cloud subscription, or if you want to use your local machine to run models, there will be trade-offs for speed and cost and everything based on your specific scenario. Personally, and for these lessons, I'll continue to use Cloud.

[01:40] There are just a lot more options and it's a lot more feature-rich for some of the things we want to do, but if the pipeline you're looking to build kind of ends here to generating out plans, then it's definitely worth considering OLAMA and some of the models available.