Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.
We are looking for a Staff AI Inference & Acceleration Engineer to join the Platform Software team and own the on-board inference architecture for Figure’s humanoid robots. You will be the technical authority on how AI workloads are mapped, optimized, and executed across the robot’s compute hardware — driving down power consumption and cost while meeting the strict latency and reliability demands of a real-time autonomous system.
Responsibilities:
- Own the on-board inference architecture — mapping models to available accelerators (NPU, GPU, DSP, CPU) based on latency, power, and memory budgets.
- Partition inference workloads across heterogeneous compute resources, balancing real-time performance with power and thermal constraints.
- Define and maintain a system-level compute budget across all inference tasks running on the robot.
- Evaluate next-generation acceleration hardware and contribute to the definition of future compute platform requirements.
- Optimize inference toolchains end-to-end — from model export through runtime execution — for target hardware.
- Apply quantization (INT8, INT4, mixed-precision), pruning, operator fusion, and other compression techniques to reduce compute, memory, and power footprint.
- Profile inference pipelines to identify and eliminate bottlenecks in latency, memory bandwidth, and power consumption.
- Optimize kernel scheduling, memory layout, and data movement across the compute hierarchy.
- Partner closely with the AI/ML team to define model architecture constraints that are hardware-friendly from the outset.
- Work with the Platform Software team on runtime integration, scheduling, and power management.
- Engage with silicon vendors and research teams to track the accelerator landscape and influence hardware roadmaps.
Requirements:
- M.S. or Ph.D. in Computer Engineering, Electrical Engineering, Computer Science, or a related field — or equivalent industry experience.
- At least 8 years of industry experience in hardware acceleration, ML systems, or compute architecture.
- Deep understanding of AI/ML inference — model formats (ONNX, TFLite, etc.), inference runtimes, and deployment pipelines.
- Hands-on experience optimizing models for edge or embedded hardware using quantization, pruning, and operator-level tuning.
- Strong understanding of computer architecture — memory hierarchies, data movement, and heterogeneous compute.
- Experience profiling and benchmarking inference workloads across CPU, GPU, NPU, DSP.
- Familiarity with low-level toolchains and compilation frameworks (e.g. TVM, MLIR, TensorRT, Torch, SNPE/QNN, JAX, CUDA, ROCm).
- Solid software engineering skills in C++ and Python.
- Strong cross-functional communication skills — able to work effectively across hardware, software, and AI/ML teams.
Bonus Qualifications:
- Knowledge of real-time operating constraints and their impact on inference scheduling.
- Track record of co-designing model architectures with ML teams to meet hardware constraints.
The US base salary range for this full-time position is between $180,000 - $275,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.