Human-in-the-Loop Automation

Role: Sole builder — design, prompts, n8n workflows, HIL gates
Domain: Operations automation (hospitality-leaning)
Surface: 5 kits on one shared, runnable 5-node spine
Stack: n8n · local Ollama (phi4:14b) · JSON mode · rule-based gate
Principle: AI drafts; a deterministic gate escalates the risky cases
Status: Runnable + dogfood-verified on sample runs · production channels pending

The shared workflow architecture: trigger, load inputs, local LLM, and a rule-based human-in-the-loop gate that splits routine cases (auto-resolve) from high-stakes cases (human approval) before a structured result. Below, each of the five kits is shown with a verified sample input, the model's verdict, and which way the gate routed it.

The shared spine, the auto-resolve vs. human-approval split, and each kit's verified sample run.

The Operational Problem

Operations teams run on inbound volume — support tickets, sales leads, public reviews, new-account onboarding, staff policy questions. Most of it is routine and could be handled in seconds. A small slice is genuinely high-stakes: a refund dispute, a safety incident, an enterprise contract, a one-star review naming a guest.

The naive fix — let AI answer everything — is unsafe, because a confident wrong answer lands exactly on that high-stakes slice. The opposite — make a person read everything — spends the scarce resource on cases that never needed it. These kits are built around that tension: absorb the routine volume automatically, and guarantee the high-stakes cases reach a human.

The Product Principle

One rule holds across all five: AI drafts, a human approves anything risky — never the reverse. And the escalation decision is not left to the model's discretion alone. Each workflow runs a two-layer gate. First the model returns its own requires_human_approval flag inside strict JSON. Then deterministic code re-checks the case against kit-specific rules and can force escalation regardless of what the model said.

The model is allowed to be cautious; it is never the only thing standing between an automated action and a guest, a dollar, or a safety event.

How One Workflow Runs

Every kit is the same runnable spine — five n8n nodes, no cloud API keys, executable on a laptop:

Trigger — a sample case enters the workflow.
Load Inputs— the case is paired with the kit's system prompt.
Local LLM — Ollama (phi4:14b) returns strict JSON (json mode, temperature 0.2).
Parse + HIL Gate — the JSON is parsed, then the kit-specific rule decides auto-resolve vs. human approval.
Result — a structured object plus the human-approval flag.

The same five kits also ship a design-exact production graph that swaps the local model for real channels and models — covered below.

The Five Kits

Five kits, one spine. Each classifies, scores, or drafts in its domain and carries its own escalation rule. Every example below is a verified sample run from the local build.

Hospitality Ticket Triage

Human

Decides: category, priority (low → urgent), department, guest-facing draft reply
Escalates when: priority is high/urgent, or the text mentions refund, chargeback, passport, safety, injury, or legal
Verified run: “$2,400 chargeback”→priority: urgentHuman

Lead Qualification + Scoring

Human

Decides: ICP fit score (0–100), tier A/B/C, intent, next-best action, reply
Escalates when: score ≥ 80, tier A, enterprise/security/legal terms, or a borderline lead with hot intent
Verified run: “VP, multi-property, budget approved”→fit score 85Human

Review Response Agent

Auto

Decides: sentiment, category, severity, brand-voice public response
Escalates when: negative sentiment or high severity
Verified run: “5-star review”→positiveAuto

Customer Onboarding Engine

Auto

Decides: segment, personalization plan, first steps, risk flags, welcome
Escalates when: enterprise/high-value, SSO, security/legal, money-touching, or missing consent
Verified run: “SMB, non-technical”→routineAuto

Internal Ops SOP Bot

Human

Decides: answer, cited SOP references, confidence, reply (grounded RAG)
Escalates when: low confidence, or the question touches a spill, safety, HR, refund, or waiver
Verified run: “Refund policy?”→cites P1Human

From Prototype to Production

Alongside each runnable kit is a design-exact production scaffold — an importable n8n graph that is not yet run with live data. They share a common shape:

Multi-channel intake — webhook, email/IMAP, schedule, or Telegram.
Normalize — dedup, canonical schema, and consent / lawful-basis checks.
Cheap classifier → premium drafter — cost-tiered models, with citation-enforced retrieval (RAG) where there is a knowledge base.
Force-HIL validation — the deterministic gate, in code.
Mandatory approval — a send-and-wait step in Slack or email, with an approve / redraft / escalate router; execution only fires after approval.
Audit + recovery — an immutable log with cost tracking, a dedicated error sub-workflow, and feedback capture for continuous improvement.

Productionizing a kit means swapping the local-Ollama node for its real channels and model, adding credentials, and feeding sanitized data.

Scope & Honesty

The five runnable kits execute end to end on local Ollama with no cloud API keys; verified on built-in sample inputs (2026-06-02, gate retest 2026-06-19).
The production graphs are design-exact and importable, but have not yet been run with real channels or data.
No client metrics or ROI claims — this is sample-set proof of the classify/score/draft plus human-gate logic.
Sample inputs are synthetic and the embedded SOP policies are illustrative.