Site Observability Engineer

Jobgether · US

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Observability Engineer based in United States.

This role is central to ensuring engineering teams have full visibility into system health, performance, and reliability across complex distributed environments. The engineer will design and operate end-to-end observability platforms covering metrics, logs, traces, and events, enabling fast and accurate detection of issues before they impact users. The environment is highly technical, cloud-native, and deeply aligned with SRE principles, with strong emphasis on automation, scalability, and signal quality. The role involves shaping how telemetry is collected, stored, and transformed into actionable insight across the organization. It also requires close collaboration with platform, SRE, and product engineering teams to embed observability into every layer of the system. The position is ideal for someone passionate about reliability engineering, data-driven operations, and building systems that empower others to debug and improve production services.

Accountabilities

This role is responsible for building, operating, and evolving the organization’s observability ecosystem, ensuring engineers can effectively monitor, troubleshoot, and improve distributed systems at scale.

Design and operate enterprise-grade observability platforms across metrics, logs, traces, and events
Architect and manage tools such as Prometheus, Thanos, Mimir, Grafana, Loki, Tempo, OpenTelemetry, and Datadog
Define and enforce SLOs, SLIs, error budgets, and observability standards across teams
Build alerting frameworks integrated with on-call systems to reduce noise and improve incident response
Develop instrumentation standards including logging formats, metric naming, and trace propagation
Manage large-scale telemetry pipelines with a focus on performance, retention, and cost optimization
Build dashboards and self-service tools to improve observability adoption across engineering teams
Improve incident response readiness through better alerting, monitoring, and post-incident analysis
Partner with SRE and platform teams to embed observability into CI/CD and deployment workflows
Mentor engineers on observability best practices, debugging techniques, and reliability engineering principles

Requirements:

The ideal candidate brings deep experience in observability, SRE practices, and distributed systems, with strong technical and communication skills to drive adoption across engineering teams.

5+ years of experience in SRE, platform engineering, or observability-focused roles
Strong hands-on expertise with Prometheus, Grafana, and at least one commercial tool (Datadog, New Relic, or Splunk)
Solid understanding of OpenTelemetry, distributed tracing, and structured logging
Proficiency in at least one programming language such as Go, Python, or Java
Experience operating high-scale metrics and log pipelines with high cardinality
Strong knowledge of SLOs, SLIs, error budgets, and reliability engineering principles
Experience integrating observability systems with CI/CD and incident management tools
Solid understanding of Linux systems, networking, and containerized environments
Strong troubleshooting, analytical, and communication skills
Experience in building or scaling observability platforms is highly valued

Benefits:

Competitive salary range ($100K–$150K based on experience)
100% remote work within the United States
Full-time W2 employment structure (no C2C or 1099 arrangements)
Health, dental, and vision insurance options
Paid time off and company holidays
Retirement savings plan with employer contributions
Professional development and career growth opportunities
Exposure to modern cloud-native observability stacks and large-scale distributed systems
Collaborative engineering culture focused on reliability and continuous improvement

DevOps pay context

Based on 1,241 disclosed DevOps salaries on RoleSuite, the role pays a median of $140K/year, with most offers between $115K and $173K (10th–90th percentile: $100K–$208K).

This posting lists $100K–$150K, below the $140K market median.

See the full DevOps salary breakdown →

Apply →

Site Observability Engineer

Accountabilities

Requirements:

Benefits:

DevOps pay context

Other roles at Jobgether

More DevOps roles