End-to-end tutorial: gateway routes an inference request to an orchestrator, the AI runner processes it, and the result returns through the full pipeline. Covers off-chain local setup with a HuggingFace model.
The gateway handles routing and payment negotiation. The orchestrator handles compute. Run both on one machine, off-chain, and watch a full inference request travel through both sides and return a result without a wallet or on-chain registration.
This tutorial runs a complete local AI inference pipeline: a gateway receives a client request, routes it to a local orchestrator, the orchestrator processes it through an AI runner container, and the result returns to the caller. Estimated time: 2 to 3 hours (most of this is model download time).What you will verify:
The gateway routes an inference request to the orchestrator
The orchestrator processes it through the AI runner
The response returns through the gateway to the caller
Client (curl) ↓ POST /text-to-imageGateway (port 8936) ↓ routes job + PM ticketOrchestrator (port 8935) ↓ dispatches to AI runnerAI runner container ↓ SDXL-Lightning inference on GPUOrchestrator ↓ result + ticket evaluationGateway ↓ PNG responseClient
The gateway and orchestrator run as separate processes. In production, they run on separate machines. This tutorial runs both locally to make the log trace visible end-to-end.
price_per_unit sets the orchestrator’s sell-side price. The gateway’s buy-side cap must be at or above this value for the job to route. In Step 4 the gateway is started with no explicit price cap, so it accepts any price.
-orchAddr http://127.0.0.1:8935 - points directly at the local orchestrator (off-chain mode bypasses on-chain discovery)
-httpIngest - enables the AI inference HTTP endpoints
-remoteSignerAddr - community remote signer for payment ticket signing (no wallet needed)
Separate -cliAddr and -httpAddr from the orchestrator’s ports (7936 and 8936 vs 7935 and 8935)
The remote signer at signer.eliteencoder.net is a community-hosted service for testing. Check availability in #local-gateways on Discord before you start.
Step 5: Send an inference request through the gateway
Send a text-to-image request through the gateway on port 8936. Keep port 8935 for the gateway-to-orchestrator hop:
Copy
Ask AI
curl -X POST http://localhost:8936/text-to-image \ -H "Content-Type: application/json" \ -d '{ "model_id": "ByteDance/SDXL-Lightning", "prompt": "a coastal town in evening light, photorealistic", "width": 512, "height": 512, "num_inference_steps": 4 }' \ -o pipeline-output.png \ --max-time 60
This request travels the full pipeline. A typical first inference takes 5 to 15 seconds (VRAM kernel warm-up on the first job). Subsequent requests take 2 to 4 seconds.Verify the output:
The request left footprints in each component. Read the logs to understand what happened at each hop:Gateway log - shows routing decision and payment signing:
The request completed the full Livepeer AI pipeline:
The curl request hit the gateway at :8936 on the /text-to-image endpoint.
The gateway selected the local orchestrator at :8935 (the only option via -orchAddr), signed a payment ticket using the community remote signer, and forwarded the job request.
The orchestrator received the job, forwarded it to the AI runner container via Docker-out-of-Docker, and waited for the result.
The AI runner loaded the SDXL-Lightning model from VRAM (it was pre-warmed), ran 4 diffusion steps, and returned a PNG.
The orchestrator returned the result to the gateway and evaluated the payment ticket (in off-chain mode, settlement is handled by the remote signer instead of the Arbitrum TicketBroker).
The gateway returned the PNG to the curl client.
In production, the orchestrator is registered on-chain and the gateway discovers it via the Livepeer protocol. Payment tickets settle on Arbitrum through the TicketBroker contract. The inference mechanics are identical.