AI infrastructure product

Immersive AI API

One endpoint for image, video, speech, music, 3D, and text generation. Agent-native MCP support and x402 + USDC billing on Base, so autonomous agents can pay per call without a human in the loop.

Visit Immersive AI API Discuss a similar build

Challenge

Package the best of frontier AI — Flux, Nano Banana, Seedream, Z-Image Turbo, Veo 3.1, ElevenLabs, Whisper — into a single API that feels coherent to product teams and natively usable by agents.

Outcome

A unified REST + MCP surface that consolidates the strongest current models for image, video, audio, 3D, and text, with autonomous payment built in. Used by Immersive products and external developers building agentic systems.

What it is

The Immersive AI API is one endpoint that exposes the strongest current generative models — image, video, speech, music, 3D, and text — through both a traditional REST surface and a native MCP server, so agents can call it the same way a human developer would.

Why it matters

Cuts the integration cost of frontier AI from "pick a vendor per modality" to "one credential, one shape"
Agent-native: every endpoint is exposed via MCP so Claude, ChatGPT, and any agent runtime can use it without bespoke glue
Autonomous billing with x402 + USDC on Base means agents can spend on inference inside their own budgets, no human approval loop

My role

I shaped the product direction, picked the model line-up, designed the API shape, and built the infrastructure that ties payment, routing, and multi-modal generation together.