Sr Site Reliability Engineer

Jobgether · India

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr Site Reliability Engineer based in India.

This role sits at the core of a rapidly scaling observability platform that powers how engineering teams monitor, debug, and optimize complex distributed systems. You will be responsible for ensuring the reliability, scalability, and performance of a large-scale SaaS infrastructure that processes massive volumes of observability data. The environment is highly technical and deeply hands-on, requiring strong instincts for diagnosing production issues and preventing them at scale. You will work closely with platform, data, and product engineering teams to maintain and evolve a petabyte-scale system built on modern cloud-native technologies. The role involves owning uptime, performance, and operational excellence across Kubernetes-based infrastructure and high-throughput data pipelines. This is an opportunity to shape the backbone of a globally used open-source product trusted by thousands of engineering teams.

Accountabilities:

In this role, you will own the operational stability and scalability of a large distributed observability platform while continuously improving system performance, reliability, and automation. You will:

Design, operate, and improve large-scale Kubernetes infrastructure including upgrades, scaling, networking, and multi-tenancy
Ensure system reliability through strong SRE practices including SLOs, SLIs, error budgets, incident response, and on-call optimization
Scale and maintain high-throughput ingestion pipelines handling petabyte-scale observability data
Operate, tune, and optimize data systems such as ClickHouse for performance, cost efficiency, and reliability
Build automation and tooling using infrastructure-as-code and CI/CD to improve deployment and operational efficiency
Monitor, debug, and resolve complex production issues across distributed systems
Improve observability of the platform itself using modern monitoring, logging, and tracing practices

Requirements:

This role requires strong experience in building and operating large-scale distributed systems with a deep focus on reliability and performance. You should bring:

5–8 years of experience in SRE, infrastructure, platform engineering, or backend systems roles
Deep hands-on expertise with Kubernetes in production-scale environments
Strong understanding of distributed systems, failure modes, performance tuning, and capacity planning
Experience working with high-scale data systems (ClickHouse, Kafka, or similar) is highly desirable
Proficiency in at least one programming language (Go strongly preferred) with a focus on automation and system reliability
Familiarity with observability concepts and tools such as OpenTelemetry, metrics, logs, and traces
Strong problem-solving skills with the ability to debug complex production issues
Excellent communication skills with the ability to write clear documentation and runbooks
Experience in fast-paced, high-ownership, remote-first environments
Open-source contributions or strong engagement with OSS ecosystems is a plus

Benefits:

Competitive salary package ranging from ₹50L to ₹1Cr annually
Fully remote, India-based role with flexible, async-friendly working culture
High ownership role with direct impact on a globally used open-source platform
Opportunity to work on petabyte-scale distributed systems and cutting-edge observability infrastructure
Strong engineering culture focused on shipping, reliability, and continuous improvement
Exposure to modern cloud-native technologies including Kubernetes, ClickHouse, and OpenTelemetry
Collaborative, high-caliber team environment with strong technical peers
Opportunity to contribute to a fast-growing open-source ecosystem used by thousands of engineering teams

DevOps pay context

Based on 1,239 disclosed DevOps salaries on RoleSuite, the role pays a median of $141K/year, with most offers between $115K and $173K (10th–90th percentile: $101K–$210K).

See the full DevOps salary breakdown →

Apply →

Sr Site Reliability Engineer

Accountabilities:

Requirements:

Benefits:

DevOps pay context

Other roles at Jobgether

More DevOps roles