Senior/Staff Software Engineer - Machine Learning & System Optimization

Zoox · Foster City, CA / Boston, MA / Seattle, WA

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.

As a Machine Learning and System Optimization Engineer, you will orchestrate and allocate overall system capacity to various core perception models running on-bot, as well as drive large initiatives that allow for more efficient inference by sharing various parts of the perception stack with one another.

You will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience compressing, accelerating, and deploying complex models, including LLMs, VLMs, or foundation models, for power- and thermal-constrained vehicle SoCs.

In addition, you will optimize ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

In this role, you will:

  • Allocate and distribute system resources (CPU/GPU/interconnect) to various models and inference engines running on the robot.

  • Spearhead cross-cutting initiatives that allow for better compute utilization through sharing/fusing models and better scheduling strategies.

  • Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA).

  • Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.

  • Write production-level, low-latency, and memory-safe C++ and CUDA code for real-time inference on vehicle systems.

  • Qualifications:

  • Deep experience in system and performance optimization in CPU/GPU systems designed for low latency or high throughput.

  • Deep expertise in working with real-time systems & required constraints such as processing latency, memory utilization, and memory bandwidth pressure.

  • Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference frameworks (INT8, FP8, FP4, BF16/FP16).

  • Proficiency in low-level programming for AI accelerators, specifically developing and optimizing custom ML OPs and TensorRT Plugins with efficient CUDA kernel implementations.

  • Production-level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory-safe, real-time inference code for edge devices.

  • Bonus Qualifications:

  • Prior experience in high-performance robotics applications such as AV/drones/robots.

  • Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi-modal sensor processing (Vision, LiDAR, Radar).

  • Experience with end-to-end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., TensorRT-LLM).

  • Software pay context

    Based on 7,833 disclosed Software salaries on RoleSuite, the role pays a median of $158K/year, with most offers between $123K and $200K (10th–90th percentile: $102K–$235K).

    This posting lists $226K–$307K, above the $158K market median.

    See the full Software salary breakdown →
    Apply →