AI in the
background
Classification, extraction, scoring, drafting. Embedded inside your existing process, not in a chat window.
The difference
Not every AI task needs a conversation.
Most of the time the right shape is an AI step plugged into a workflow you already run. The team never sees it. They just see the output.
Your Process
AI Step
Human Checkpoint (optional)
Clean Output
Example
Inbound lead qualification
The same shape works for support triage, invoice review, or any other classify-then-act process.
Manual review
Open each new contact form. Read what they want and who they are.
Check your ICP: industry, company size, geography. Does this lead actually fit?
Look the company up. New contact? Past lead? Already a customer?
Score the fit. Write a short note. Assign to a rep.
Repeat for every lead. Every week. Forever.
AI workflow
Strong ICP fit. Route to senior rep.
Right industry. Verify the company before reaching out.
Outside ICP. Auto-reply with self-serve link.
Not enough info to score. Send a clarifying email.
Every lead scored. Audit log per decision. Reps follow up on the qualified ones.
Proposal Eligibility Analysis for German Energy Subsidies
Hölzl is an engineering firm that helps homeowners claim German government subsidies for energy-efficient renovations. For every project, contractors send in a PDF proposal listing dozens of line items, and a consultant has to decide which ones the subsidy actually covers, line by line, against a thick rule book (the BEG EM).
We built a 6-stage pipeline that reads the PDF, looks up the relevant rules, and classifies every line with a written reasoning. What used to take 2-3 hours per proposal now runs in 12 seconds, and the consultant only reviews the flagged ones.
6 stage pipeline · per proposal
↓ see each stage in the live example below
1. Read the PDF
OCR the contractor's proposal into text
2. Pull out the line items
One row per item: position, description, qty, €
3. Look up the rules
Find the relevant passages in the rule book + FAQ
4. Load project context
Which measures is this project approved for?
5. Classify each line
Eligible / needs proof / unclear / not eligible, with written reasoning
6. Consultant review
Human approves the flagged lines only
Proposal Analyser
Input
Stage 1Proposal_47.pdf
Schreiner & Söhne GmbH
Active measures
Stage 42.1 · Wall insulation
External wall (EWIS)
2.4 · Plinth insulation
5.1 · Energy consulting
Output summary
Stage 5Classified line items
Stages 2 · 3 · 5Confirmation it was used for insulation work.
Is the waterproofing within the insulated area?
Results
Hours of rule-book reading, done in seconds.
12 sec
per proposal, down from 2-3 hours
1,500+
line items classified in production
96%
classification accuracy on the eval set
100%
decisions with written reasoning + audit trail
Our approach
How we build it with you.
Most AI workflows succeed or fail on two things: how carefully the AI step is set up, and where your team stays in the loop. We build with you until it's reliable enough to put your name on.
Find the work worth automating
We start by mapping where your team makes the same kind of decision over and over: classifying a message, scoring a lead, reviewing a line item, drafting the same response. That's where AI workflows pay back. We agree on the process before any AI gets involved.
Where does manual judgment live?
// Input
line_id: UUID
description: string
qty: number
unit: string
active_measures: string[]
// Output
ai_status: 'eligible' | 'eligible_needs_proof'
| 'unclear' | 'not_eligible'
ai_bucket: 'direct' | 'indirect' | 'planning'
| 'unrelated' | 'unknown'
ai_reasoning: string
ai_proof_note: string | null
Agree on what goes in and what comes out
Before the AI does anything, we agree on exactly what it gets to see and exactly what it has to return. A fixed list of fields, a fixed list of allowed answers, written reasoning every time. That's the difference between AI that makes things up and a workflow your team can trust.
Decide where the human stays in the loop
Anything client-facing or high-stakes gets a review step. We agree the dials with you: auto-approve when the AI is confident, send to a human in the grey zone, reject below a floor. These are your numbers, not ones we bake in. Your team can adjust them at any time.
Decision routing
Auto-approve
confidence ≥ 0.85
Send to human review
0.60 ≤ confidence < 0.85
Reject + flag
confidence < 0.60
Accuracy after each tuning round
All tweaks done in a config table. No code changes.
Tune it with your team until it ships
Every change to the AI is saved as a version. Every time your team overrides a decision, that case gets added to the test set. Accuracy climbs from rough first draft to something you'd trust to run unattended in a few rounds, not a few weeks.
The instructions live in a config table you can edit. No code, no devs in the loop for small tweaks.
Capabilities
Four shapes the AI step takes.
Each one fits a different kind of repetitive work. Mix them inside one pipeline when the job calls for it.
Classify and tag
AI reads each incoming item and gives it a label, a status, or a score. The right ones go to the right place, your team only opens the ones that need them.
"Pool is green, 3rd time"
"How do I export?"
"Got the invoice, thanks"
Works for: support tickets, sales leads, line items, photo uploads, invoices.
Extract structured data
AI pulls the fields you care about out of PDFs, scans, or messy text and drops them into a clean table your other tools can use.
Works for: invoices, contracts, forms, transcripts, screenshots.
Draft with a quality gate
AI writes the first draft, then scores its own work against your standards. Anything below the bar is held back for a human, the rest goes straight out.
draft
Works for: listing descriptions, email replies, proposal copy, social posts.
Chain steps together
Several AI steps run on the same item, each one a specialist. Their results combine into a single decision: ship, send to review, or reject.
input
review
Works for: editorial pipelines, compliance checks, claims review, content moderation.
Every workflow we build includes
Human review on flagged items
Approval queues in Airtable, Slack, or your tool.
Editable instructions, no code
Your team tunes wording in a config table.
Full audit log on every decision
Who, when, what input, what reasoning.
Tunable thresholds
Adjust auto-approve and review bars any time.
More pipelines we have shipped
Three workflows. Same pattern.
Different domains, different inputs, different outputs, but the same shape underneath. Pick a pipeline to see how the AI step fits in.
Pick a pipeline
One AI step, three categorizations on every incoming message. Pick the example to see each in action.
Sentiment scoring
Reads the last few messages from one customer and scores how they're feeling.
"Pool is green again. Third time this month. What is going on?"
Sentiment score
↓ declining trend
Tone worsened across 3 messages over 8 days. No callback logged. At-risk account, escalate to manager.
Needs response
Decides whether the message actually needs a human reply or just a logged acknowledgement.
"Thanks, got the invoice. Looks good. See you next week."
Needs reply
No
Confirmation message, no question asked. Bot sends a thanks-back. Saves the rep a context-switch.
Urgency detection
Flags safety risks and active incidents so they jump the queue, even out of hours.
"URGENT, pump leaking everywhere, pool draining fast!"
Urgency
Critical · page on-call
Equipment failure with damage risk. SMS sent to the on-call rep, ticket created at priority 1.
Input · scanned PDF
A 4-page contractor quote, scanned (not exported). Skewed, noisy, mixed fonts. The vision model still has to pull every row out cleanly.
Quote_0143.pdf
scanned, 4 pages
Output · structured table
Every line item parsed into a schema-validated row, then cross-checked against the project spec to flag anything off.
47
rows extracted
€47,820
total parsed
1
discrepancy
Pos. 3.4 flagged · quoted 35 lm, project spec calls for 42 lm. Sent to the consultant for clarification before the proposal moves on.
Input · blog draft
"5 Trends Shaping the Category in 2026…"
6 agents · parallel scoring
each scores 0–100Every draft is scored on six independent axes before anyone sees it. Each agent is a specialist with its own prompt and rubric.
Fact
82
3 claims verified
Social
76
hook strong
Virality
68
below target
Brand
95
on voice
SEO
71
3 keywords hit
Audience
78
resonates
Routing
Auto-approve
all ≥ 75
Human review
Virality 68 · 60–74
Reject
any < 60
Same shape, every time
Input contract → AI step → structured output → checkpoint. The pattern repeats across domains.
Production-grade
Retry logic, fallbacks, rate limiting, eval sets. Every pipeline is built to run unattended.
Tunable by your team
Thresholds, prompts, rubric weights, all editable in a config table. No code change to retune.
Is this right for you?
Three signs an AI workflow will pay off.
A quick gut-check. If two of these three describe you, the workflow will earn its keep.
Someone is making the same call over and over
Classifying a message. Scoring a lead. Reviewing a line item. Drafting the same kind of reply. Same shape of decision, same rough rules every time.
At this volume the workflow pays back in weeks, not quarters.
The data already lives in tools we can read
If your inputs are already in your CRM, your inbox, Airtable, or cloud storage, we plug the AI step into what you already use. No new platform for your team to learn.
We plug into
If the data only lives in someone's head, we'd need to collect it first. That's a different project.
A human can catch the rare edge cases
The AI doesn't need to be perfect. It needs to be confident when it's right, and flag when it's not. Anything high-stakes goes through a review step before it ships.
AI handles the 80% your team would decide the same way every time. Humans focus on the 20% that actually needs judgment.
If two of those three sound like you, we should talk.
30-minute scope call. We pick the highest-leverage spot and quote a fixed price.
Need autonomy and conversation instead of background processing? See AI Agents.
Ready to build
your workflow?
30-minute scope call. We look at where the repetitive judgment lives, pick the highest-leverage spot, and quote a fixed price.
Book a Scope CallNo commitment required
30
min call
Fixed
price quote
2-3
week MVP guarantee
Or explore our other services