flow8 — Intelligent Document Intake

The business case

Documents still arrive as documents — and someone still retypes every field

The problem

Documents arrive the way they always have — an email with a PDF, a scanned image, an Excel export, a body-text request — and a person opens each one, reads it, retypes the fields into a system, decides what it triggers, and tries to remember they already handled it. It is slow, it is inconsistent from one person to the next, and the moment volume spikes the backlog and the typos grow with it. Worse, the same document re-sent, or the same run repeated, quietly creates duplicate orders and duplicate tasks.

The obvious fix is to point an off-the-shelf AI at the inbox and let it read, decide, and post. But a purchase order or invoice commits to price, quantity, dates, and identity — and the moment a model can post those fields to your ERP, one bad parse becomes a wrong order, a duplicate payment, a customer record silently overwritten. That is exactly the authority you cannot hand a model.

Who feels it

Order management and inside sales handling PO intake, and the AP and finance shared-services teams behind invoices and contracts
Bid, proposal, and engineering teams doing RFQ and spec intake, and claims operations working insurance packets
The ops leaders who own SLA, accuracy, and audit exposure on every one of those documents

Time to value

Fast — assembled from flow8 building blocks that already exist and are adversarially hardened. A pilot points at one inbound channel with the kill-switch on and runs shadow-first, so you see validated records and prepared tasks before any reach a person.

What you get

A document stops being a re-keying chore and becomes a governed record

The same pipeline serves every document type you actually receive — one channel or many.

🗂️

Any document becomes a typed record

Digital PDF, scanned image via OCR, and CSV are read and coerced into a validated, structured record — fields, line items, classifications — with no manual re-keying.

✅

One review task per document, never a duplicate

Exactly one reviewer task per document, deduped against the database on a content-derived key, so re-sends and overlapping runs never double-create an order or a task.

⚠️

Honest about formats it can't safely read

Digital PDF, scan, and CSV are handled; native spreadsheet and word-processor files are flagged for manual review rather than silently dropped or mis-read — you always know what was and wasn't parsed.

🔒

Nothing is sent or committed automatically

Every document terminates in a prepared draft a human approves. A bad parse can never become a bad order, a wrong invoice posting, or a customer record overwritten on its own.

📜

Every extraction is auditable by design

Each record is stored with its model, prompt version, and confidence, keyed to the decoded document bytes — a durable, trust-by-design trail you can defend to an auditor later.

⚡

Drops onto your existing channel in days

It reuses hardened primitives, so a single-channel pilot lands documents as validated records plus prepared tasks within days — not a multi-quarter ERP integration project.

How it works

One governed spine, from inbound document to human approval

The model proposes; a human executes; nothing touching money or identity ever auto-fires. It is the same secure spine every flow8 Solution runs — here worn as document intake.

Every inbound document runs the identical sequence. The LLM is permanently demoted to an advisor over deterministic facts; the consequential output is a proposed record on a tamper-evident register — not an action.

01

📨

Cursored intake Only documents newer than the stored watermark are drained; each is type-routed to the right extractor. IMAP · OCR

02

🧪

Injection pre-scan A deterministic Code heuristic treats every extracted byte as data, not instructions, before any model sees it. data, not instructions

03

🧩

Schema-locked extract A schema-locked LLM suggests fields, line items, and classifications; totals, dates, and name-resolution are computed in Code. model suggests

04

⚖️

Code decides The binding verdict — valid, quarantine, or flag — is made in deterministic code, never by the model. Code authoritative

↓

05

📝

Draft-not-act register Every validated document is written as a proposed record on the tamper-evident register. draft, not act

06

🚦

Policy gate A deterministic gate classifies each record; money and identity are capped at prepare-only by construction. prepare-only

07

🙋

One human task Exactly one review task is opened per document; a full evidence record is written before any side-effect. audit-before-effect

→

👤

Human reviews & commits A person approves in one click. The record commits to the system of record under their authorship. human-gated

↓

Safe output A validated record and a prepared next step approved by a human · recorded on a signed register · reversible

Intelligent Document Intake watches an inbound channel, drains only the new documents since a stored cursor, and routes each one by type — born-digital PDF first, OCR only when the text layer is empty, CSV via a parser, with native spreadsheet and word-processor files honestly flagged for manual review. Before any model sees the text, the injection pre-scan runs. A schema-locked LLM then acts purely as a suggester — coercing the text into a strict, validated record while the objective fields and the binding decision are computed in code.

Because the LLM is permanently demoted to an advisor over deterministic facts, because money and identity records are capped at prepare-only by construction, and because the extraction row is written before any side-effect on a hash-chained, signed register keyed to the decoded bytes, you get agentic value without ever handing a model the authority to act. Off-the-shelf agents give a model authority first and bolt on guardrails later — flow8 makes the guardrail the architecture.

Why it's safe to run

Secure and efficient by construction — not by policy

Secure by construction

The guardrail is the architecture, so adding AI to document intake stops being a risk-underwriting exercise.

Deterministic injection pre-scan. A Code heuristic (control / zero-width / bidi chars + imperative-override markers) runs after extraction and before any LLM. A flagged document takes zero LLM passes and is quarantined — stored, not dropped. There is no security module pretended.
Never auto-act on money or identity. An order or invoice commits to price, quantity, and date, so a validated record is always written as a draft proposed row and flagged for one human approval. The prepared output is routed to a review queue, never the counterparty.
Audit before side-effect. The model id, prompt version, extracted fields, confidence, and injection flag are recorded before any task fires — so a failed side-effect never loses the provenance of the extraction.
Tamper-evident register. Each row carries a hash chain plus an HMAC-SHA256 signature under a frozen canonicalization, keyed to a content hash of the decoded bytes, with a read-only sweep re-verifying the chain to catch any committed-not-prepared escape.
Sovereign and provider-swappable. State of record lives in your own f8db; the optional vector index for prior-document recall is a rebuildable derived copy on your own store; the AI provider is a swappable setting. The data never has to leave your infrastructure.

Efficient by construction

The same properties that make it safe make it cheap to run at volume.

Idempotent by construction. A content-derived document key written before the side-effect is the upsert conflict key; the task id is confirmed only after a 2xx. The same document re-sent, the run repeated, or two overlapping schedules all collapse to one record and one task.
Draft-not-act removes rework. A bad parse costs one rejected draft, not a credit memo and an apology email — a human reviews a prepared record instead of unwinding an auto-posted order.
Scoped, cursored intake. Only documents newer than the watermark are drained, the fetch limit is always hard-capped so a lost cursor degrades to a paged backlog drain, and the cursor advances only to the durably-processed watermark.
Deterministic where it counts. Totals, dates, and name-resolution run in pure Code and lexical matching — the model only suggests structure, so the quantitative backbone never depends on (or pays for) an LLM, and a scoped seen-query skips already-persisted documents before any extraction cost.
Self-healing dashboards. The register re-aggregates on every run, so documents-by-status, confidence, and quarantine rate stay live instead of freezing a stale number.

Built from

Assembled from proven, hardened capabilities

Not rebuilt from scratch — composed from the same governed building blocks every flow8 Solution shares, so it ships in days.

The capabilities it composes

Cursored document intake Type-routed text & OCR extraction Injection pre-scan Schema-locked AI extraction Deterministic field & name resolution Content-keyed dedup register Draft-not-act policy gate Tamper-evident audit trail

Connects to your stack

IMAP & Exchange mailboxes ERP & CRM systems of record Enterprise task & workflow queues On-prem vector store & knowledge base Document & content stores Reporting & BI dashboards Any REST / OData API

Where it fits

The same process shape serves every document-driven industry

Any business whose work arrives as a document — PO, invoice, contract, RFQ, spec, claims packet — and must be validated against a system of record before anyone acts.

Industry

Finance & ERP

Invoice and PO intake, order-to-cash, and validation against the ledger before any posting.

Industry

Legal & Compliance

Sovereign, audit-trailed extraction of contracts, engagement letters, and regulated documents.

Industry

Engineering Teams

RFQ and spec-package intake, with new specs surfaced as diffs against prior projects.

Industry

Healthcare

Claims packets and structured forms typed into validated records, with money capped at prepare-only.

Composes with

A validated record from one solution is the clean upstream another consumes

Adopt this one and it plugs into the spine the others already speak.

Solution · a channel into this

Email Operations Hub

A shared-inbox unit is one more source that routes into the same extract → validate → prepared-record chain.

Solution · feeds

Governed ERP / System-of-Record Connector

A validated PO or invoice record is exactly the input the governed connector turns into a prepared, gated write.

Solution · powers

Sovereign Knowledge Base Builder

The on-prem vector collection gives intake a prior-document recall leg so new specs surface diffs against past projects.

Insight · the thinking

The ERP that acts

Why every extracted record must be validated and gated before it can touch the system of record.

Point it at one channel. Kill-switch on. Shadow-first.

Watch the first week of documents land as validated records plus prepared, human-owned actions — drafts only, no tasks, full audit trail. When you're ready, flip on the review queue and stack governance, prior-document recall, and downstream ERP write-back on the exact same pipeline.

Book a demo →

← All solutions

Every inbound document becomes avalidated, structured record.

Documents still arrive as documents — and someone still retypes every field

The problem

Who feels it

A document stops being a re-keying chore and becomes a governed record

Any document becomes a typed record

One review task per document, never a duplicate

Honest about formats it can't safely read

Nothing is sent or committed automatically

Every extraction is auditable by design

Drops onto your existing channel in days

One governed spine, from inbound document to human approval

Secure and efficient by construction — not by policy

Secure by construction

Efficient by construction

Assembled from proven, hardened capabilities

The same process shape serves every document-driven industry

A validated record from one solution is the clean upstream another consumes

Point it at one channel. Kill-switch on. Shadow-first.

Every inbound document becomes a
validated, structured record.