Production Engineer - Database Operations

Palantir · London, United Kingdom

A World-Changing Company
 
Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.

The Role
 
Database Operations is responsible for the deployment and maintenance of the database layer that Palantir’s products depend on. As an engineer on the Database Operations team, you will design how databases are deployed and operated across a wide variety of environments, build the automation and observability that keep them healthy, and respond to the critical production issues that affect some of the world's most impactful use cases.

Core Responsibilities

  • Build software that automates the routine work of deploying and running production databases
  • Participate in regular on-call rotations
  • Troubleshoot, diagnose, and remediate stability and reliability issues in production database systems
  • Participate in post-incident reviews and take ownership of follow up actions
  • Identify patterns across incidents, support tickets, and alerts, and translate those observations into proposals for systemic improvements to our fleet
  • Manage and execute large scale migrations of the fleet of databases we run
  • Partner with customer-facing teams during incidents and when setting up new database installations
  • Work with database engineering teams and infrastructure teams to build resilient and highly available database systems
  • Continuously invest in documentation, metrics, monitors and other troubleshooting tools
  • Hold the bar on engineering and operational standards through code reviews, design reviews, and feedback on operating procedures
  • Build deep expertise and experience in production systems (Kubernetes, cloud environments, Cassandra, Elasticsearch, etc.) and share that knowledge amongst the team
  • What We Value

  • Strong sense of ownership in ambiguous environments. You step in to drive important outcomes when lines of responsibility are unclear.
  • Ability to collaborate and empathize under pressure. You stay calm and work effectively during high-stakes incidents.
  • Curiosity and motivation to learn. The systems and technologies you work on will change often, and you can quickly build the understanding to be effective through the changes.
  • Ability to operate with autonomy in a rapidly changing environment where priorities shift.
  • First-principles thinking. You want to thoroughly understand the systems you work on rather than applying band-aid fixes, and you ask questions to get there.
  • Investment in the team around you. You share what you learn, give thoughtful feedback in reviews, and treat raising the team's overall capability as part of the job.
  • What We Require

  • Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field.
  • Experience writing code a variety of programming languages such as Java, Python, and Go, as part of a past role or personal projects.
  • Experience with cloud and orchestration technologies such as Kubernetes, Helm, AWS, GCP, Azure, and OpenShift.
  • Experience with database technologies such as Cassandra, Elasticsearch and Kafka.
  • A solid foundation in Linux and how distributed web services work.
  • Familiarity with observability tools such as Grafana and Prometheus.
  • Strong written and verbal communication and the ability to iterate quickly with teammates and incorporate feedback.
  • Apply →