Security OSWorkflow AutomationCLIAI AgentsSecurity Engineering

Building LLM-friendly workflow automation

Venkat PothamsettyJune 2, 20267 min read
Building LLM-friendly workflow automation

We set out to do one thing: let a model run real security workflows on our platform — not narrate them, not suggest them, actually run them, end to end. The goal was simple to state. Getting there meant a string of choices, and the order we made them in turned out to matter more than any single one.

This post is about those choices. The biggest was also the first: if a model is going to drive the platform, what exactly should it be driving?

The thing we were trying to make easy

Start with a real workflow, because every choice downstream was made in service of it. A customer is onboarding and you need a posture review before they go live. Said out loud, it's one sentence: pull every cloud account they've connected, run IAM and storage exposure checks across all of them, cross-reference anything internet-facing against known CVEs, rank the findings by blast radius, and draft a remediation plan you can send back.

run -q "Onboarding review for the customer: pull every cloud account they've
        connected, run IAM and storage exposure checks across all of them,
        cross-reference anything internet-facing against known CVEs, rank the
        findings by blast radius, and draft a remediation plan I can send back."

That one sentence is a dozen steps in a trench coat, and the design decision we cared about most was to never make the user unpack it. So nobody tells the model the order. It enumerates the accounts — it doesn't know how many there are until it looks. It fans the IAM and storage checks across every one of them, in parallel where it can. It notices that "cross-reference against CVEs" can't start until the exposure results exist, so it waits. It pulls the CVE data, joins it against the internet-facing resources, scores each finding by blast radius, and only then drafts the plan — because a plan written before the scoring is a plan written against the wrong priorities.

You scripted none of that. You stated a result, and the dependency graph got derived on the way to it. One account or forty, the instruction is the same — the work scales itself to the situation. Build that and you've built LLM-friendly workflow automation. The question is where it lives.

The choice that decided the rest: four ways to expose a platform

Anything a platform does, it has to expose somehow — and there are really only four ways. We looked at all four as a spectrum, each with a real job, and asked a single question of each: can a model drive this well, and can we build on top of what it produces?

A UI is for a human exploring by hand. Right the first time you touch a platform, wrong the hundredth — and it leaves no trail you can hand to anyone else, which for a security product is already disqualifying.

An API is maximum power: every capability, from any language. But you build the integration yourself, request by request, and maintain it forever. A model can call an API — now you're writing and babysitting the client that lets it.

An MCP server lets a model reach for a tool ambiently, mid-conversation, inside a chat host. For "while you're here, you can also do this," it's a genuinely good fit. But look at everything that has to be true first: a server running and reachable, a connector registered, a host that speaks the protocol, and a model in the loop to decide when to call. That's a lot of ceremony for "run the review and page me." And the shape you get back is narrow — model-mediated, scoped to a session. An MCP tool is a destination something calls. You don't build programs on top of a destination.

A CLI is the connective tissue: the layer humans type into, scripts wrap, CI calls, and agents drive — all through one surface. And it's a primitive, not a destination, which is what decided it for us.

Why we shipped the primitive first

The difference cuts one way only. Because a command line is scriptable and composable, you can build anything on top of it — a cron job, a CI gate, a Slack bot, a scheduled drift check. And if you want an MCP after all, you can wrap the CLI in one. You cannot run that backward: there's no easy way to unwrap an MCP into composable primitives. You build a destination on top of a primitive; you can't recover a primitive from a destination. That asymmetry is the whole reason we built the command line before anything else.

It's also the one interface a person, a script, a CI job, and an agent already speak identically. The onboarding review you ran by hand drops straight into a pipeline — capture the session, gate the next step on it actually passing:

SID=$(run --json -q "Onboarding review for the customer" | jq -r .session_id)
verify --json "$SID" | jq -e '.checks.insights.passed == true'

No bespoke client. No connector to register. The same sentence works whether a human types it, a cron job fires it, or an agent decides on its own that now's the time to run it. One way of asking, four kinds of caller.

The choice that paid off twice: the trail comes for free

For security and compliance work this wasn't a side effect, it was a requirement. When the thing you do is state a goal and get a run, every action is also a command — copy-pasteable, inspectable, and re-runnable. "Here is the exact request that produced this finding" is something you can hand an auditor, drop into a runbook, or paste into a ticket, and anyone — human or agent — gets the same answer when they run it again. Clicking through a UI leaves no such trail; a model-mediated MCP call is hard to reproduce byte-for-byte.

Reproducibility by construction, which in this domain isn't a nicety. It's the point. An outcome you can't reproduce is just an anecdote.

What's coming

This is the first post in a short series walking through what we built and why. Over the next few we'll get into:

  • how a goal becomes a multi-step run without anyone scripting the sequence
  • the full case for the command line over the UI, the API, and the MCP — with the receipts
  • what it's like to hold a conversation with a live run instead of firing blind, one-shot calls
  • pointing an agent at a machine-readable description of the platform and walking away while it discovers what's there, mints a key, and drives a run end to end

That last one is the thesis in miniature: the lowest common denominator between a person, a script, a CI job, and an agent is the ability to run a command. So the most LLM-friendly thing we could build turned out to be the oldest interface we have — the platform becomes something you run, and the only thing you have to bring is the outcome you want.

More soon. For now, the one decision worth keeping: let people say what they want, not how to get it.

Continue the conversation

Get Access to SecurityOS

Start private access for your security team and evaluate autonomous triage, compliance, and exposure workflows in one place.

Share this post:

Recent Posts