Stack AI is a no-code platform for designing, testing, and deploying AI workflows powered by large language models. Our visual, drag-and-drop interface lets teams connect their data to AI models and ship production applications — from chatbots to document-processing pipelines to database Q&A tools — without writing code.
Enterprises run real work on AI agents, and at Stack AI that work runs on a single engine. Some agents finish in a second. Others run for days, fan out into dozens of sub-agents, pause, resume, and recover from failures without losing a step. We're hiring a Senior Software Engineer, Engine & Distributed Systems to own that engine: the durable runtime at the core of the platform that has to be correct every time, at any scale.
This is deep systems work at the heart of the product. When the engine is solid, agents simply run — and getting it there is one of the more interesting distributed-systems problems in AI today. You'll own it end to end, from the execution model to how it behaves in production.
Own the execution engine. The runtime, scheduling, and sub-agent parallelization that run every agent on the platform.
Make long-running work durable. Build checkpointing, resumption, and recovery so agents survive failures and restarts and pick up exactly where they left off.
Shape the execution model. Decide how work is scheduled, queued, and moved from synchronous to asynchronous, so the platform stays correct and responsive as load grows.
Engineer for scale and reliability. Hold the engine to strict health targets for worker freshness, deploy safety, and drain time, and keep latency and throughput strong as volume grows.
Keep the engine open to the ecosystem. Make it straightforward to bring new agent harnesses, orchestration frameworks, and model capabilities into the runtime.
5+ years building backend systems in production, with real depth in distributed systems.
Hands-on experience with durable execution or workflow orchestration (Temporal, Cadence, Airflow, or equivalent), with a way of thinking rooted in idempotency, state machines, and failure recovery.
Strong command of concurrency, queueing, retries, and fault tolerance under load.
Strong in Python and modern backend frameworks (FastAPI or similar), with sound database fundamentals (Postgres or similar).
You're drawn to the correctness problems that everything else quietly depends on.
Distributed systems is broad. If you're strong on most of this and excited to grow into the rest, we'd like to hear from you, even if you don't check every box.
Operating Temporal at scale.
Event-driven architectures and message queues.
Experience with PydanticAI, LangGraph, or similar.
AI or agent runtimes: tool-calling, sub-agent orchestration, streaming.
Performance and cost optimization of high-throughput backends.
Startup or growth-stage experience.
You'll join a lean, high-impact team and own the engine that every customer's agents run on. Your work ships fast and is felt across the whole product.
Stack AI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Based on 7,682 disclosed Software salaries on RoleSuite, the role pays a median of $158K/year, with most offers between $123K and $199K (10th–90th percentile: $101K–$236K).
This posting lists $220K–$240K, above the $158K market median.
See the full Software salary breakdown →