From Conversations to Structured Data: A New Way to Interact with Business Systems

Orders no longer arrive as forms to fill out. They flood in via WhatsApp mid-shift: "Yeah, send John at Acme five of the usual — actually make it six," or through a Telegram one-liner with a typo'd product name. This is the new reality: business happens where your customers and teams already spend their time — on the apps everyone has in their pocket.

There was a time when automating this seemed like science fiction. Voice assistants like Siri barely understood our requests, and traditional NLP consistently failed to turn these streams of informal decisions into concrete actions. Today, thanks to model reasoning and MCP, it's not just possible — it's reliable.

The tempting move is to let an LLM read the message and write the record. It works in demos. Then it quietly creates the wrong invoice, for the wrong contact, at the wrong quantity — fluently and confidently, which is exactly the failure mode we wrote about last time: the well-argued but wrong output is the hardest kind to catch.

So we don't do that. The pattern we run in production is narrower and far more reliable: AI proposes a structured extraction, the user confirms it with one tap, and only then does it commit — through a typed tool layer (MCP), never an ad-hoc write. AI reasons and extracts; the human always has the final say; MCP commits. Here's why this combination changes everything.

Why not just let the AI write the record?

Because a wrong invoice costs more than a wrong chatbot reply, and at any real volume the AI will be confidently wrong some of the time.

The input genuinely resists clean parsing. It's voice notes (often mixing languages mid-sentence), half-sentences, typo'd product names, and multi-turn corrections — "actually make it eight." An extraction model handles most of that well and a slice of it wrong, with total confidence either way. For a read — answering a question — that's fine. For a write that becomes an order, an invoice, or a line of CRM truth, "mostly right" is a liability, not a feature. As we've said before, even 98% accuracy isn't enough at scale: two errors in a hundred become six hundred in thirty thousand.

So the design question is not how do we extract perfectly — you can't. It's how do we make the unavoidable errors visible and cheap to catch before they commit. That reframing is the whole post.

How do you make the translation predictable?

You shrink the problem: constrain the model to a known schema and a small set of typed tools, cap how far it can run on its own, and resolve who's talking before it starts.

Three things turn an open-ended "understand this message" into a checkable "fill a known shape":

Structured output, not prose. The model doesn't write a paragraph; it emits a call against a typed schema — enums for things like priority and deal stage, real lookups for products against a reference catalog rather than trusting the transcript's spelling. The output space is small and validatable.
A bounded agent. The extraction runs as a reason-act loop, but capped — in our case at five tool calls per message. It can search for a contact, draft an order, and stop; it can't spiral.
Identity resolved first. Before the model runs, the sender's number is resolved to a user, org, account, and role. The model only ever sees the tools and data that person is allowed to touch — which is both a security boundary and a correctness one: it can't write to the wrong account because it can't see it.

Reframed this way, "turn a voice note into a business object" becomes "fill a constrained schema from a small vocabulary" — and that's the part language models actually do reliably.

Where does the human actually fit?

The user's confirmation is the commit. The AI never writes on its first pass — it runs the write as a dry-run, shows a clean preview, and only a Confirm button executes it for real.

This is the heart of the whole system. Every write tool supports a dry-run mode. When the model wants to create something, it calls the tool with dry_run on, gets back exactly what would be written, and returns a clean, human-readable summary — not raw JSON:

Order draft — please confirm • Contact: John Smith (Acme Corp) • 6 × House Red 2021 • Total: €312

[ Confirm ] [ Modify ] [ Cancel ]

Those buttons are inline, right inside WhatsApp or Telegram — no app to switch to, no dashboard to open. Confirm re-runs the operation for real; Modify drops back into conversation so the user can adjust ("make it eight"); Cancel discards it. Pending drafts expire on a timer — a few minutes for routine actions, longer for orders that need back-and-forth — so nothing stale ever commits.

This is the entire reliability trick. The person sees precisely what will be created, in the channel they're already in, and their "yes" is the only thing that writes. A mistake becomes a glance instead of a cleanup job. It's also the concrete version of the argument from the last post: you don't get reliable AI by trusting better output, you get it by engineering the moment of commitment — here, a literal button.

And here's the key difference from the old way: no more "send and hope for the best." The user always has the final say.

Why put MCP in the middle?

Because the model should never touch your database directly. MCP gives it a fixed registry of typed, permissioned tools, so every read and write goes through one audited, schema-validated door.

The model doesn't call your API or write SQL. It calls tools — crm_create_service_ticket, crm_add_order_line_item, crm_log_interaction — from a registry of around 195 across our modules, all behind a single JSON-RPC endpoint. That buys four things that matter for reliability:

Validation at the boundary. Every tool has a schema; malformed arguments are rejected before they reach the database.
Permissions and identity injected centrally, not trusted from the model.
Dry-run and idempotency built in, so the preview-then-confirm flow works uniformly and a double-tapped Confirm can't create two orders.
One audited surface. The model is API-agnostic and literally cannot invent an endpoint — it can only call tools that exist.

And the same tool layer that serves the chat bot also serves Claude Desktop, Cursor, and our internal agents. The messaging flow isn't a bespoke integration; it's one more client of a registry we already maintain.

What makes it bulletproof at scale?

The boring engineering around the magic: ordering, idempotency, and persisted context. None of it is glamorous, and all of it is load-bearing.

Ordering. Voice notes are transcribed in parallel (roughly 30–47% faster than doing them one at a time) but committed sequentially per conversation, through a task queue. So "five of the usual" followed by "actually six" can never race — the correction always lands after the thing it's correcting.
Idempotency. A confirm tap, a flaky network, a double click — none of them can create the same order twice.
Memory. The conversation is persisted, so multi-turn edits resolve against the live draft instead of starting over, and long voice transcripts are compressed for context while the full original is always kept.
Failure handling. Transient model and transcription hiccups retry with backoff and surface to monitoring rather than silently dropping a customer's order.

Individually these are unremarkable. Together they're the difference between a slick demo and something a business can run its orders through.

The takeaway

The AI is the easy part of this system — and the unreliable one. The reliability comes from the frame around it: a constrained schema so the output is checkable, a single tap so a human owns the commit, a typed tool layer so writes can only happen one validated way, and the plumbing that keeps it all ordered and idempotent.

Put differently, you don't make AI bulletproof by trusting it more. You make it bulletproof by deciding exactly where the human's judgment enters — and in this case, that's the final validation, not the initial request. Messy conversation in, predictable business object out, with the user's explicit approval as the only door to your data.

This is a fundamentally different approach from the old "send and hope for the best" model. Here, the user always has the final say.

This is the engine behind Reflekt CRM's WhatsApp and Telegram bots, and the pattern we build for voice- and chat-to-system pipelines. If you've got orders or data arriving as messages and no reliable way to capture them, let's talk.