Projects

Four AI products, two lenses each.

Evaluation, RAG, multimodal, and safe agents. Each one is a working demo, then reads two ways: an engineering case study for how it's built, and a product case study for why. Every demo runs in mock mode, so it works with no API keys.

Evaluation / LLMOpsMock demo

AgentEval Studio

An evaluation and observability workbench that compares AI prompt / RAG / agent variants on quality, cost, latency, and failure modes, and recommends a release gate.

AI EngineeringAI Product Management
Next.jsTypeScriptAnthropic APILLM-as-judge
RAG / Product StrategyMock demo

SignalDesk AI

An AI product-intelligence workspace that ingests user feedback, clusters pain points, finds evidence, and generates PRDs, roadmap bets, and experiment plans, every claim cited.

AI EngineeringAI Product Management
Next.jsTypeScriptAnthropic APIOpenAI embeddings
Multimodal / Product UXMock demo

ScreenSense QA

A multimodal UX/product QA tool that reviews UI screenshots for accessibility, friction, copy clarity, and visual hierarchy, and returns prioritised, severity-scored recommendations.

AI EngineeringAI Product Management
Next.jsTypeScriptClaude vision (claude-opus-4-8)Structured output
Agents / Trust & SafetyMock demo

WorkflowPilot Safe Agents

A safe-agent demo that turns a business goal into a proposed multi-step workflow, runs only human-approved tool calls, and records every action in an audit trail.

AI EngineeringAI Product Management
Next.jsTypeScriptClaude tool-useHuman-in-the-loop