AI Operations Platform · System Prototype

Reztrix: Frontline Ops Intelligence

Reztrix is an AI-assisted decision-support platform for hospitality operations. I built it to help frontline teams work across fragmented systems, surface the highest-impact issues faster, and keep human approval in the loop when the cost of error is high.

Role
Lead Product Builder / Technical PM
Product
AI decision-support system for frontline hospitality operations
Stack
FastAPI, PostgreSQL, Next.js, LLM-based retrieval and evaluation workflows
Architecture
293 REST endpoints · 69 tables · RAG via pgvector
Result
Custom evaluation pipeline improved safety-oriented pass rate from 42% to 84% across synthetic test scenarios.

The Operational Problem

Enterprise hospitality operations are highly manual behind the scenes. Frontline managers often have to pull together fragmented information from multiple systems just to answer simple but high-stakes questions: Which guest issues need action first? What can be solved now versus escalated? What is the safest next step when systems are incomplete or inconsistent?

This is exactly where generic AI wrappers break down. A model can generate plausible language, but plausible language is not the same as operationally safe action. In complex hospitality operations, the wrong recommendation can create guest-facing problems, staff confusion, or actions that legacy systems cannot reliably support. Reztrix was designed around that constraint from the start: use AI to accelerate understanding, but keep execution grounded, reviewable, and human-approved.

What I Built

Reztrix combines operational data, retrieval, and AI-assisted reasoning into a single decision-support workflow for frontline use. At a system level, the product does four things:

  • Aggregates fragmented operational context across issues, requests, and status signals.
  • Retrieves relevant context so outputs are grounded in the right operational evidence.
  • Generates recommended actions rather than free-form unbounded answers.
  • Routes those recommendations through human review before anything consequential happens.

This was not built as a prompt demo. It is a full-stack software system with a FastAPI backend, a structured PostgreSQL data model, a retrieval layer, and an evaluation workflow designed strictly around enterprise constraints.

Architecture and Technical Decisions

On the backend, Reztrix uses a FastAPI application layer with a PostgreSQL data model and a pgvector-powered retrieval layer. The goal was to create a stack that was fast to iterate, inspectable, and grounded.

  • Secure, Scoped Access: The data model and access controls (including Row-Level Security policies) were designed to keep operational data appropriately segmented and reviewable.
  • Model Benchmarking: Benchmarked Claude against GPT-4o on JSON schema adherence, output consistency, and latency, ultimately selecting the Claude API based on structured-output reliability and lower cost per inference.
  • Retrieval Before Recommendation: The system uses RAG to improve grounding and reduce unsupported outputs. I embedded 43 Standard Operating Procedures (SOPs) into 226 vector chunks to anchor recommendations in approved operational policies rather than model guesswork.

The Evaluation Engine

AI systems in operations should not be evaluated only on whether a response sounds helpful. They need to be evaluated on whether the recommendation is grounded, safe, and usable inside the real workflow.

Because live production data included operational sensitivity, I engineered a synthetic evaluation environment to pressure-test the system without exposing real guest information. I modeled frontline scenarios that reflected the kinds of ambiguity, escalation risk, and incomplete context that managers deal with in practice. From there, I built a custom LLM-as-judge evaluation workflow across 50 targeted test scenarios to ask:

  • Did it retrieve and use the right context?
  • Did it avoid unsupported or unsafe recommendations?
  • Did it escalate appropriately when confidence or authority was limited?

The first passes exposed predictable failure modes: incomplete grounding and overconfident recommendations. I used those failures as product input, iterating on retrieval, prompt structure, and recommendation logic. That systematic root-cause analysis improved the pass rate from a 42% baseline to 84%. Evaluation was not a final QA step after the build. It became part of the product loop itself: define the behavior, test the behavior, identify failure patterns, and redesign the system around safer performance.

Human-in-the-Loop (HITL) Workflow

Reztrix was built on a simple principle: AI recommends, people decide. In practice, that meant recommendations were surfaced with context and rationale, then held for manager review rather than executed automatically. The interface was designed to make that review step fast and operationally realistic: surface the issue, show the supporting context, recommend a next step, and make approval or escalation explicit. In frontline environments, trust comes from controlled usefulness, not maximum automation.