- Role
- Sole builder — design, prompts, n8n workflows, HIL gates
- Domain
- Operations automation (hospitality-leaning)
- Surface
- 5 kits on one shared, runnable 5-node spine
- Stack
- n8n · local Ollama (phi4:14b) · JSON mode · rule-based gate
- Principle
- AI drafts; a deterministic gate escalates the risky cases
- Status
- Runnable + dogfood-verified on sample runs · production channels pending
The Operational Problem
Operations teams run on inbound volume — support tickets, sales leads, public reviews, new-account onboarding, staff policy questions. Most of it is routine and could be handled in seconds. A small slice is genuinely high-stakes: a refund dispute, a safety incident, an enterprise contract, a one-star review naming a guest.
The naive fix — let AI answer everything — is unsafe, because a confident wrong answer lands exactly on that high-stakes slice. The opposite — make a person read everything — spends the scarce resource on cases that never needed it. These kits are built around that tension: absorb the routine volume automatically, and guarantee the high-stakes cases reach a human.
The Product Principle
One rule holds across all five: AI drafts, a human approves anything risky — never the reverse. And the escalation decision is not left to the model's discretion alone. Each workflow runs a two-layer gate. First the model returns its own requires_human_approval flag inside strict JSON. Then deterministic code re-checks the case against kit-specific rules and can force escalation regardless of what the model said.
The model is allowed to be cautious; it is never the only thing standing between an automated action and a guest, a dollar, or a safety event.
How One Workflow Runs
Every kit is the same runnable spine — five n8n nodes, no cloud API keys, executable on a laptop:
- Trigger — a sample case enters the workflow.
- Load Inputs— the case is paired with the kit's system prompt.
- Local LLM — Ollama (phi4:14b) returns strict JSON (json mode, temperature 0.2).
- Parse + HIL Gate — the JSON is parsed, then the kit-specific rule decides auto-resolve vs. human approval.
- Result — a structured object plus the human-approval flag.
The same five kits also ship a design-exact production graph that swaps the local model for real channels and models — covered below.
The Five Kits
Five kits, one spine. Each classifies, scores, or drafts in its domain and carries its own escalation rule. Every example below is a verified sample run from the local build.
Hospitality Ticket Triage
Human- Decides
- category, priority (low → urgent), department, guest-facing draft reply
- Escalates when
- priority is high/urgent, or the text mentions refund, chargeback, passport, safety, injury, or legal
- Verified run
- “$2,400 chargeback”→priority: urgentHuman
Lead Qualification + Scoring
Human- Decides
- ICP fit score (0–100), tier A/B/C, intent, next-best action, reply
- Escalates when
- score ≥ 80, tier A, enterprise/security/legal terms, or a borderline lead with hot intent
- Verified run
- “VP, multi-property, budget approved”→fit score 85Human
Review Response Agent
Auto- Decides
- sentiment, category, severity, brand-voice public response
- Escalates when
- negative sentiment or high severity
- Verified run
- “5-star review”→positiveAuto
Customer Onboarding Engine
Auto- Decides
- segment, personalization plan, first steps, risk flags, welcome
- Escalates when
- enterprise/high-value, SSO, security/legal, money-touching, or missing consent
- Verified run
- “SMB, non-technical”→routineAuto
Internal Ops SOP Bot
Human- Decides
- answer, cited SOP references, confidence, reply (grounded RAG)
- Escalates when
- low confidence, or the question touches a spill, safety, HR, refund, or waiver
- Verified run
- “Refund policy?”→cites P1Human
From Prototype to Production
Alongside each runnable kit is a design-exact production scaffold — an importable n8n graph that is not yet run with live data. They share a common shape:
- Multi-channel intake — webhook, email/IMAP, schedule, or Telegram.
- Normalize — dedup, canonical schema, and consent / lawful-basis checks.
- Cheap classifier → premium drafter — cost-tiered models, with citation-enforced retrieval (RAG) where there is a knowledge base.
- Force-HIL validation — the deterministic gate, in code.
- Mandatory approval — a send-and-wait step in Slack or email, with an approve / redraft / escalate router; execution only fires after approval.
- Audit + recovery — an immutable log with cost tracking, a dedicated error sub-workflow, and feedback capture for continuous improvement.
Productionizing a kit means swapping the local-Ollama node for its real channels and model, adding credentials, and feeding sanitized data.
- The five runnable kits execute end to end on local Ollama with no cloud API keys; verified on built-in sample inputs (2026-06-02, gate retest 2026-06-19).
- The production graphs are design-exact and importable, but have not yet been run with real channels or data.
- No client metrics or ROI claims — this is sample-set proof of the classify/score/draft plus human-gate logic.
- Sample inputs are synthetic and the embedded SOP policies are illustrative.