Staff Software Engineer, Infrastructure

Jobgether · US

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Staff Software Engineer, Infrastructure based in the United States.

This is a senior technical leadership role focused on building the foundational infrastructure that powers a global developer platform used at massive scale. You will design and evolve the internal systems that enable hundreds of engineers to reliably ship, operate, and scale services across cloud environments. The role sits at the intersection of platform engineering, distributed systems, and developer experience, with a strong emphasis on automation, self-service, and operational excellence. You will help transform fragmented, expert-driven workflows into reliable “paved roads” that teams can use independently and safely. A key focus is improving provisioning speed, deployment reliability, and multi-region infrastructure maturity. You will also play a central role in shaping platform standards, driving cross-team alignment, and ensuring systems are secure, observable, and cost-efficient. Success in this role means delivering infrastructure that disappears into the background because it simply works.

Accountabilities:

  • Define and lead the evolution of internal infrastructure platforms by turning ambiguous technical challenges into scalable architectural proposals and driving them through RFCs and cross-team alignment.
  • Design and build self-service platform capabilities and APIs (primarily in Go) for provisioning, onboarding, deployment, observability, and operational workflows with strong documentation and clear contracts.
  • Establish and improve delivery standards using Terraform, GitOps (Argo CD), CI/CD pipelines, and progressive deployment strategies to enable safe, repeatable releases.
  • Architect and evolve multi-region, multi-account infrastructure on Kubernetes (EKS), including networking, ingress, traffic routing, and cross-region connectivity.
  • Improve platform reliability and operational maturity through enhanced SLOs, monitoring, alerting, and incident management practices using observability tools such as Grafana.
  • Drive adoption of platform capabilities across engineering teams by ensuring solutions are usable, trusted, and measurably reduce operational friction and dependency on manual support.
  • Participate in on-call rotations while also improving operational health through better alerting, runbooks, and long-term reliability improvements.
  • Requirements:

    • 8+ years of professional software engineering experience in backend, infrastructure, or platform engineering roles.
    • Strong hands-on expertise in Go or similar backend languages, with a focus on system design, testing, reliability, and long-term maintainability.
    • Proven experience building, scaling, and operating production infrastructure or cloud-based platforms.
    • Deep knowledge in at least one of the following areas: Kubernetes, cloud infrastructure, networking, reliability engineering, or developer platforms.
    • Strong understanding of Linux systems, networking fundamentals, and production operations at scale.
    • Experience driving cross-team alignment and influencing technical direction through design documents, RFCs, and architecture reviews.
    • Familiarity with modern DevOps practices such as Terraform, CI/CD pipelines, GitOps, and observability tooling (Prometheus, OpenTelemetry, Grafana).
    • Strong communication skills, with the ability to clearly articulate complex infrastructure decisions in a distributed, remote-first environment.
    • Nice to have: experience with EKS, service mesh/ingress (e.g., Envoy), progressive delivery, or large-scale platform migrations.
    • Benefits:

      • Competitive compensation with equity participation
      • Remote-first work environment with global flexibility
      • Flexible PTO and company-wide breaks for rest and recharge
      • Home office setup support and monthly tech stipend
      • Parental leave policy (up to 16 weeks, after eligibility period)
      • Annual learning and training budget for courses and conferences
      • Health, dental, vision, and retirement benefits (varies by location)
      • Quarterly wellness breaks and additional time-off initiatives
      • Inclusive, distributed culture with strong engineering ownership
      • Opportunity to shape foundational infrastructure used by millions of developers.
Apply →