DevOpsJobs
RoleSuite
CompaniesRemoteAboutMethodologyContactPrivacy
Updated 2026-07-03 19:00 UTC·© 2025–2026 RoleSuite
← Back to listings

Site Observability Engineer

Jobgether · US

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Observability Engineer based in United States.

This role is central to ensuring engineering teams have full visibility into system health, performance, and reliability across complex distributed environments. The engineer will design and operate end-to-end observability platforms covering metrics, logs, traces, and events, enabling fast and accurate detection of issues before they impact users. The environment is highly technical, cloud-native, and deeply aligned with SRE principles, with strong emphasis on automation, scalability, and signal quality. The role involves shaping how telemetry is collected, stored, and transformed into actionable insight across the organization. It also requires close collaboration with platform, SRE, and product engineering teams to embed observability into every layer of the system. The position is ideal for someone passionate about reliability engineering, data-driven operations, and building systems that empower others to debug and improve production services.

Accountabilities

This role is responsible for building, operating, and evolving the organization’s observability ecosystem, ensuring engineers can effectively monitor, troubleshoot, and improve distributed systems at scale.

  • Design and operate enterprise-grade observability platforms across metrics, logs, traces, and events
  • Architect and manage tools such as Prometheus, Thanos, Mimir, Grafana, Loki, Tempo, OpenTelemetry, and Datadog
  • Define and enforce SLOs, SLIs, error budgets, and observability standards across teams
  • Build alerting frameworks integrated with on-call systems to reduce noise and improve incident response
  • Develop instrumentation standards including logging formats, metric naming, and trace propagation
  • Manage large-scale telemetry pipelines with a focus on performance, retention, and cost optimization
  • Build dashboards and self-service tools to improve observability adoption across engineering teams
  • Improve incident response readiness through better alerting, monitoring, and post-incident analysis
  • Partner with SRE and platform teams to embed observability into CI/CD and deployment workflows
  • Mentor engineers on observability best practices, debugging techniques, and reliability engineering principles
  • Requirements:

    The ideal candidate brings deep experience in observability, SRE practices, and distributed systems, with strong technical and communication skills to drive adoption across engineering teams.

    • 5+ years of experience in SRE, platform engineering, or observability-focused roles
    • Strong hands-on expertise with Prometheus, Grafana, and at least one commercial tool (Datadog, New Relic, or Splunk)
    • Solid understanding of OpenTelemetry, distributed tracing, and structured logging
    • Proficiency in at least one programming language such as Go, Python, or Java
    • Experience operating high-scale metrics and log pipelines with high cardinality
    • Strong knowledge of SLOs, SLIs, error budgets, and reliability engineering principles
    • Experience integrating observability systems with CI/CD and incident management tools
    • Solid understanding of Linux systems, networking, and containerized environments
    • Strong troubleshooting, analytical, and communication skills
    • Experience in building or scaling observability platforms is highly valued
    • Benefits:

      • Competitive salary range ($100K–$150K based on experience)
      • 100% remote work within the United States
      • Full-time W2 employment structure (no C2C or 1099 arrangements)
      • Health, dental, and vision insurance options
      • Paid time off and company holidays
      • Retirement savings plan with employer contributions
      • Professional development and career growth opportunities
      • Exposure to modern cloud-native observability stacks and large-scale distributed systems
      • Collaborative engineering culture focused on reliability and continuous improvement

DevOps pay context

Based on 1,241 disclosed DevOps salaries on RoleSuite, the role pays a median of $140K/year, with most offers between $115K and $173K (10th–90th percentile: $100K–$208K).

This posting lists $100K–$150K, below the $140K market median.

See the full DevOps salary breakdown →
Apply →

Other roles at Jobgether

  • Senior ML Operations (MLOps) EngineerUS
  • Security Engineer, Product SecurityUS
  • Sr. Manager, Growth MarketingUS
  • Director of Data Engineering, HealthcareUS
  • VP, Payor PartnershipsUS
  • Telecom Observability EngineerUS
  • Assistant Controller, People LeaderUS
  • Director, Policy and Business DevelopmentUS
  • Technical Representative NSWAustralia
  • Sr. People Operations ManagerUS

More DevOps roles

  • Senior DevSecOps Engineer (AWS & Azure) (She/ He/ They)Capco · Poland - Warsaw
  • Cloud DevSecOps Engineer (AWS & Azure) (She/ He/ They)Capco · Poland - Warsaw
  • Software Engineer, CI/CD & DevOpsFender · Hamburg, Germany
  • Senior Platform Engineer (DevOps / MLOps) Oura · Hybrid - Helsinki, Uusimaa; Hybrid - Oulu, North Ostrobothnia
  • Infrastructure Support EngineerThoughtworks · Brisbane, Australia; Melbourne
  • Senior SRE - NetworksFastly · London, United Kingdom
  • Senior Engineer, DevOpsBlack Duck Software · Belfast
  • Sr. Manager - Cloud OpsSaviynt · Bengaluru
  • Site Reliability EngineerObsidian Security · Palo Alto, California, USA
  • Principal Platform Infrastructure EngineerGovTech Singapore · Singapore