Staff Software Engineer, Infrastructure

Jobgether · Canada

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Staff Software Engineer, Infrastructure based in Canada.

This is a senior infrastructure engineering role focused on building the foundational platform that enables large-scale software delivery across a globally distributed engineering organization. You will design and evolve the internal systems that power provisioning, deployment, observability, and operational workflows used by hundreds of engineers. The role sits at the intersection of distributed systems, cloud infrastructure, and developer platforms, with a strong emphasis on automation, scalability, and reliability. You will help transform expert-dependent operational processes into self-service “paved roads” that teams can trust and adopt independently. A key focus will be reducing environment provisioning times from days to hours by building robust multi-region and cross-account infrastructure foundations. You will also define platform standards, improve operational maturity, and ensure systems are secure, observable, and cost-efficient. Success in this role means delivering infrastructure that removes friction for engineering teams and quietly powers their productivity at scale.

Accountabilities:

  • Lead the design and evolution of internal infrastructure platforms by turning ambiguous technical challenges into scalable architectural solutions and driving them through RFCs and cross-team alignment.
  • Build self-service platform capabilities and APIs (primarily in Go) for provisioning, onboarding, deployment, observability, and operational workflows with strong documentation and adoption focus.
  • Define and implement delivery standards using Terraform, GitOps (Argo CD), CI/CD pipelines, and progressive delivery strategies to ensure safe and repeatable deployments.
  • Architect and improve multi-tenant Kubernetes (EKS) infrastructure, including networking, ingress (Envoy Gateway), traffic routing, and multi-region, cross-account connectivity.
  • Enhance platform reliability through improved SLOs, monitoring, alerting, and incident response processes using observability tooling such as Grafana Cloud.
  • Drive adoption of platform systems across engineering teams, ensuring solutions are intuitive, safe, and measurably reduce operational dependency on manual intervention.
  • Participate in on-call rotations while continuously improving operational health through better alerts, runbooks, and long-term reliability engineering practices.
  • Requirements:

    • 8+ years of hands-on software engineering experience in backend, infrastructure, or platform engineering roles.
    • Strong programming experience in Go or similar languages, with a focus on system design, testing, debugging, and long-term maintainability.
    • Proven track record of building, scaling, and operating production-grade cloud infrastructure or platform systems.
    • Deep expertise in at least one of: Kubernetes, cloud platforms, networking, reliability engineering, or developer platforms.
    • Strong understanding of Linux systems, networking fundamentals, and production operations at scale.
    • Experience leading technical direction and driving cross-team alignment through RFCs, architecture reviews, and design documentation.
    • Familiarity with modern infrastructure tooling such as Terraform, CI/CD pipelines, GitOps (Argo CD), and observability stacks (Prometheus, OpenTelemetry, Grafana).
    • Strong written and verbal communication skills in a remote-first environment.
    • Nice to have: experience with EKS, service mesh/ingress, progressive delivery, or large-scale platform migrations and adoption initiatives.
    • Benefits:

      • Competitive compensation with equity participation
      • Remote-first work environment with global flexibility
      • Flexible PTO and scheduled company-wide breaks for rest and recovery
      • Home office setup support and monthly technology stipend
      • Annual learning and development budget for courses, conferences, and training
      • Paid parental leave (up to 16 weeks after eligibility period)
      • Health, dental, and vision coverage (varies by location)
      • Retirement and financial benefits depending on region
      • Inclusive, distributed engineering culture focused on ownership and impact
      • Opportunity to shape foundational infrastructure used by large-scale engineering organizations.
Apply →