Senior/Staff Software Engineer - Machine Learning & System Optimization

Zoox · Foster City, CA / Boston, MA / Seattle, WA

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.

As a Machine Learning and System Optimization Engineer, you will orchestrate and allocate overall system capacity to various core perception models running on-bot, as well as drive large initiatives that allow for more efficient inference by sharing various parts of the perception stack with one another.

You will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience compressing, accelerating, and deploying complex models, including LLMs, VLMs, or foundation models, for power- and thermal-constrained vehicle SoCs.

In addition, you will optimize ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

In this role, you will:

Allocate and distribute system resources (CPU/GPU/interconnect) to various models and inference engines running on the robot.

Spearhead cross-cutting initiatives that allow for better compute utilization through sharing/fusing models and better scheduling strategies.

Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA).

Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.

Write production-level, low-latency, and memory-safe C++ and CUDA code for real-time inference on vehicle systems.

Qualifications:

Deep experience in system and performance optimization in CPU/GPU systems designed for low latency or high throughput.

Deep expertise in working with real-time systems & required constraints such as processing latency, memory utilization, and memory bandwidth pressure.

Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference frameworks (INT8, FP8, FP4, BF16/FP16).

Proficiency in low-level programming for AI accelerators, specifically developing and optimizing custom ML OPs and TensorRT Plugins with efficient CUDA kernel implementations.

Production-level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory-safe, real-time inference code for edge devices.

Bonus Qualifications:

Prior experience in high-performance robotics applications such as AV/drones/robots.

Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi-modal sensor processing (Vision, LiDAR, Radar).

Experience with end-to-end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., TensorRT-LLM).

Software pay context

Based on 7,833 disclosed Software salaries on RoleSuite, the role pays a median of $158K/year, with most offers between $123K and $200K (10th–90th percentile: $102K–$235K).

This posting lists $226K–$307K, above the $158K market median.

See the full Software salary breakdown →

Apply →

Senior/Staff Software Engineer - Machine Learning & System Optimization

In this role, you will:

Qualifications:

Bonus Qualifications:

Software pay context

Other roles at Zoox

More Software roles