📄 Intake & Capture · Solution

Every inbound document becomes a
validated, structured record.

PDFs, scans, spreadsheets, body-text requests — each one read, typed, validated, and turned into a prepared next step a human approves, never a bad parse that becomes a bad order. Runs on your infrastructure, against your systems of record, with a full audit trail.

The business case

Documents still arrive as documents — and someone still retypes every field

The problem

Documents arrive the way they always have — an email with a PDF, a scanned image, an Excel export, a body-text request — and a person opens each one, reads it, retypes the fields into a system, decides what it triggers, and tries to remember they already handled it. It is slow, it is inconsistent from one person to the next, and the moment volume spikes the backlog and the typos grow with it. Worse, the same document re-sent, or the same run repeated, quietly creates duplicate orders and duplicate tasks.

The obvious fix is to point an off-the-shelf AI at the inbox and let it read, decide, and post. But a purchase order or invoice commits to price, quantity, dates, and identity — and the moment a model can post those fields to your ERP, one bad parse becomes a wrong order, a duplicate payment, a customer record silently overwritten. That is exactly the authority you cannot hand a model.

Who feels it

  • Order management and inside sales handling PO intake, and the AP and finance shared-services teams behind invoices and contracts
  • Bid, proposal, and engineering teams doing RFQ and spec intake, and claims operations working insurance packets
  • The ops leaders who own SLA, accuracy, and audit exposure on every one of those documents
Time to value

Fast — assembled from flow8 building blocks that already exist and are adversarially hardened. A pilot points at one inbound channel with the kill-switch on and runs shadow-first, so you see validated records and prepared tasks before any reach a person.

What you get

A document stops being a re-keying chore and becomes a governed record

The same pipeline serves every document type you actually receive — one channel or many.

🗂️

Any document becomes a typed record

Digital PDF, scanned image via OCR, and CSV are read and coerced into a validated, structured record — fields, line items, classifications — with no manual re-keying.

One review task per document, never a duplicate

Exactly one reviewer task per document, deduped against the database on a content-derived key, so re-sends and overlapping runs never double-create an order or a task.

⚠️

Honest about formats it can't safely read

Digital PDF, scan, and CSV are handled; native spreadsheet and word-processor files are flagged for manual review rather than silently dropped or mis-read — you always know what was and wasn't parsed.

🔒

Nothing is sent or committed automatically

Every document terminates in a prepared draft a human approves. A bad parse can never become a bad order, a wrong invoice posting, or a customer record overwritten on its own.

📜

Every extraction is auditable by design

Each record is stored with its model, prompt version, and confidence, keyed to the decoded document bytes — a durable, trust-by-design trail you can defend to an auditor later.

Drops onto your existing channel in days

It reuses hardened primitives, so a single-channel pilot lands documents as validated records plus prepared tasks within days — not a multi-quarter ERP integration project.

How it works

One governed spine, from inbound document to human approval

The model proposes; a human executes; nothing touching money or identity ever auto-fires. It is the same secure spine every flow8 Solution runs — here worn as document intake.

Every inbound document runs the identical sequence. The LLM is permanently demoted to an advisor over deterministic facts; the consequential output is a proposed record on a tamper-evident register — not an action.
01
📨
Cursored intake Only documents newer than the stored watermark are drained; each is type-routed to the right extractor. IMAP · OCR
02
🧪
Injection pre-scan A deterministic Code heuristic treats every extracted byte as data, not instructions, before any model sees it. data, not instructions
03
🧩
Schema-locked extract A schema-locked LLM suggests fields, line items, and classifications; totals, dates, and name-resolution are computed in Code. model suggests
04
⚖️
Code decides The binding verdict — valid, quarantine, or flag — is made in deterministic code, never by the model. Code authoritative
05
📝
Draft-not-act register Every validated document is written as a proposed record on the tamper-evident register. draft, not act
06
🚦
Policy gate A deterministic gate classifies each record; money and identity are capped at prepare-only by construction. prepare-only
07
🙋
One human task Exactly one review task is opened per document; a full evidence record is written before any side-effect. audit-before-effect
👤
Human reviews & commits A person approves in one click. The record commits to the system of record under their authorship. human-gated
Safe output A validated record and a prepared next step approved by a human · recorded on a signed register · reversible

Intelligent Document Intake watches an inbound channel, drains only the new documents since a stored cursor, and routes each one by type — born-digital PDF first, OCR only when the text layer is empty, CSV via a parser, with native spreadsheet and word-processor files honestly flagged for manual review. Before any model sees the text, the injection pre-scan runs. A schema-locked LLM then acts purely as a suggester — coercing the text into a strict, validated record while the objective fields and the binding decision are computed in code.

Because the LLM is permanently demoted to an advisor over deterministic facts, because money and identity records are capped at prepare-only by construction, and because the extraction row is written before any side-effect on a hash-chained, signed register keyed to the decoded bytes, you get agentic value without ever handing a model the authority to act. Off-the-shelf agents give a model authority first and bolt on guardrails later — flow8 makes the guardrail the architecture.

Why it's safe to run

Secure and efficient by construction — not by policy

Secure by construction

The guardrail is the architecture, so adding AI to document intake stops being a risk-underwriting exercise.
  • Deterministic injection pre-scan. A Code heuristic (control / zero-width / bidi chars + imperative-override markers) runs after extraction and before any LLM. A flagged document takes zero LLM passes and is quarantined — stored, not dropped. There is no security module pretended.
  • Never auto-act on money or identity. An order or invoice commits to price, quantity, and date, so a validated record is always written as a draft proposed row and flagged for one human approval. The prepared output is routed to a review queue, never the counterparty.
  • Audit before side-effect. The model id, prompt version, extracted fields, confidence, and injection flag are recorded before any task fires — so a failed side-effect never loses the provenance of the extraction.
  • Tamper-evident register. Each row carries a hash chain plus an HMAC-SHA256 signature under a frozen canonicalization, keyed to a content hash of the decoded bytes, with a read-only sweep re-verifying the chain to catch any committed-not-prepared escape.
  • Sovereign and provider-swappable. State of record lives in your own f8db; the optional vector index for prior-document recall is a rebuildable derived copy on your own store; the AI provider is a swappable setting. The data never has to leave your infrastructure.

Efficient by construction

The same properties that make it safe make it cheap to run at volume.
  • Idempotent by construction. A content-derived document key written before the side-effect is the upsert conflict key; the task id is confirmed only after a 2xx. The same document re-sent, the run repeated, or two overlapping schedules all collapse to one record and one task.
  • Draft-not-act removes rework. A bad parse costs one rejected draft, not a credit memo and an apology email — a human reviews a prepared record instead of unwinding an auto-posted order.
  • Scoped, cursored intake. Only documents newer than the watermark are drained, the fetch limit is always hard-capped so a lost cursor degrades to a paged backlog drain, and the cursor advances only to the durably-processed watermark.
  • Deterministic where it counts. Totals, dates, and name-resolution run in pure Code and lexical matching — the model only suggests structure, so the quantitative backbone never depends on (or pays for) an LLM, and a scoped seen-query skips already-persisted documents before any extraction cost.
  • Self-healing dashboards. The register re-aggregates on every run, so documents-by-status, confidence, and quarantine rate stay live instead of freezing a stale number.
Built from

Assembled from proven, hardened capabilities

Not rebuilt from scratch — composed from the same governed building blocks every flow8 Solution shares, so it ships in days.

The capabilities it composes
Cursored document intake Type-routed text & OCR extraction Injection pre-scan Schema-locked AI extraction Deterministic field & name resolution Content-keyed dedup register Draft-not-act policy gate Tamper-evident audit trail
Connects to your stack
IMAP & Exchange mailboxes ERP & CRM systems of record Enterprise task & workflow queues On-prem vector store & knowledge base Document & content stores Reporting & BI dashboards Any REST / OData API
Where it fits

The same process shape serves every document-driven industry

Any business whose work arrives as a document — PO, invoice, contract, RFQ, spec, claims packet — and must be validated against a system of record before anyone acts.

Composes with

A validated record from one solution is the clean upstream another consumes

Adopt this one and it plugs into the spine the others already speak.

Point it at one channel. Kill-switch on. Shadow-first.

Watch the first week of documents land as validated records plus prepared, human-owned actions — drafts only, no tasks, full audit trail. When you're ready, flip on the review queue and stack governance, prior-document recall, and downstream ERP write-back on the exact same pipeline.

Book a demo →
All solutions