As an SRE in Vehicle Software, you will keep Wayve’s autonomous driving fleet reliable, observable, and safe while it operates on public roads. You will work at the boundary of software, hardware, and operations, turning real-world incidents and performance bottlenecks into lasting engineering improvements. This role offers a direct line of sight from what you build to safer deployments, faster iteration, and greater fleet scale.
Key responsibilities
Own and improve the reliability, availability, and performance of vehicle software systems used across the dev fleet.
Take part in a team on-call rotation, providing out-of-hours support for live systems when required.
Build and operate monitoring, logging, alerting, and on-call tooling that enables fast detection, diagnosis, and recovery.
Drive incident response and post-incident learning, translating root causes into durable fixes and preventative controls.
Design and deliver automation for fleet operations, deployments, and repetitive workflows to reduce manual intervention.
Partner closely with Vehicle SW, operations, and platform teams to define SLOs, reliability metrics, and release readiness.
Continuously harden the production environment through capacity planning, change management, and reliability-focused reviews.
In order to set you up for success as a Site Reliability Engineer at Wayve, we’re looking for the following skills and experience.
Essential skills
Proven experience in an SRE, production reliability, or platform operations role for complex distributed systems.
Strong Linux fundamentals and hands-on experience with CI/CD, containers (Docker), and orchestration (Kubernetes).
Proficiency in at least one systems or scripting language (Python, C++, or Rust) with a bias for automation.
Deep troubleshooting skills across networking, distributed systems, and databases, including performance and availability issues.
Experience designing observability stacks and using tools such as Datadog, Prometheus, Grafana, OpenTelemetry, Splunk, or Humio.
Clear communication skills, including incident leadership, writing postmortems, and influencing engineering priorities.
Desirable skills
Cloud platform experience (AWS, GCP, or Azure), including infrastructure-as-code and secure production operations.
Experience with real-time or safety-critical systems, hardware-in-the-loop, or embedded/robotics environments.
Familiarity with fleet operations, telemetry pipelines, and operating software on edge devices at scale.
Experience defining and running SLOs/SLIs and reliability programs across multiple teams.
This is a full-time role based in our office in Germany, Baden-Württemberg (Hybrid 3 days a week min). At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home.
Based on 1,256 disclosed DevOps salaries on RoleSuite, the role pays a median of $140K/year, with most offers between $115K and $173K (10th–90th percentile: $99K–$210K).
See the full DevOps salary breakdown →