Sr Site Reliability Engineer

Jobgether · India

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr Site Reliability Engineer based in India.

This role sits at the core of a rapidly scaling observability platform that powers how engineering teams monitor, debug, and optimize complex distributed systems. You will be responsible for ensuring the reliability, scalability, and performance of a large-scale SaaS infrastructure that processes massive volumes of observability data. The environment is highly technical and deeply hands-on, requiring strong instincts for diagnosing production issues and preventing them at scale. You will work closely with platform, data, and product engineering teams to maintain and evolve a petabyte-scale system built on modern cloud-native technologies. The role involves owning uptime, performance, and operational excellence across Kubernetes-based infrastructure and high-throughput data pipelines. This is an opportunity to shape the backbone of a globally used open-source product trusted by thousands of engineering teams.

Accountabilities:

In this role, you will own the operational stability and scalability of a large distributed observability platform while continuously improving system performance, reliability, and automation. You will:

  • Design, operate, and improve large-scale Kubernetes infrastructure including upgrades, scaling, networking, and multi-tenancy
  • Ensure system reliability through strong SRE practices including SLOs, SLIs, error budgets, incident response, and on-call optimization
  • Scale and maintain high-throughput ingestion pipelines handling petabyte-scale observability data
  • Operate, tune, and optimize data systems such as ClickHouse for performance, cost efficiency, and reliability
  • Build automation and tooling using infrastructure-as-code and CI/CD to improve deployment and operational efficiency
  • Monitor, debug, and resolve complex production issues across distributed systems
  • Improve observability of the platform itself using modern monitoring, logging, and tracing practices
  • Requirements:

    This role requires strong experience in building and operating large-scale distributed systems with a deep focus on reliability and performance. You should bring:

    • 5–8 years of experience in SRE, infrastructure, platform engineering, or backend systems roles
    • Deep hands-on expertise with Kubernetes in production-scale environments
    • Strong understanding of distributed systems, failure modes, performance tuning, and capacity planning
    • Experience working with high-scale data systems (ClickHouse, Kafka, or similar) is highly desirable
    • Proficiency in at least one programming language (Go strongly preferred) with a focus on automation and system reliability
    • Familiarity with observability concepts and tools such as OpenTelemetry, metrics, logs, and traces
    • Strong problem-solving skills with the ability to debug complex production issues
    • Excellent communication skills with the ability to write clear documentation and runbooks
    • Experience in fast-paced, high-ownership, remote-first environments
    • Open-source contributions or strong engagement with OSS ecosystems is a plus
    • Benefits:

      • Competitive salary package ranging from ₹50L to ₹1Cr annually
      • Fully remote, India-based role with flexible, async-friendly working culture
      • High ownership role with direct impact on a globally used open-source platform
      • Opportunity to work on petabyte-scale distributed systems and cutting-edge observability infrastructure
      • Strong engineering culture focused on shipping, reliability, and continuous improvement
      • Exposure to modern cloud-native technologies including Kubernetes, ClickHouse, and OpenTelemetry
      • Collaborative, high-caliber team environment with strong technical peers
      • Opportunity to contribute to a fast-growing open-source ecosystem used by thousands of engineering teams

DevOps pay context

Based on 1,239 disclosed DevOps salaries on RoleSuite, the role pays a median of $141K/year, with most offers between $115K and $173K (10th–90th percentile: $101K–$210K).

See the full DevOps salary breakdown →
Apply →