This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Reliability Engineer based in India.
Join a technology-driven environment where reliability, automation, and cloud innovation are at the heart of delivering high-performance digital platforms. In this role, you will help design, build, and operate scalable infrastructure while improving developer experience through modern platform engineering practices. Working closely with cross-functional engineering teams, you will enhance system resilience, automate operational processes, and strengthen cloud governance. This is an opportunity to work with cutting-edge cloud technologies, Kubernetes, Infrastructure as Code, and AI-powered development tools while contributing to highly available production environments. If you thrive in solving complex operational challenges and driving continuous improvement, this role offers an exciting platform to make a meaningful impact.
Accountabilities
- Design, build, and maintain internal developer platforms, self-service infrastructure, and platform services using modern cloud-native technologies.
- Develop and enhance automation solutions using Python, Bash, Go, and Infrastructure as Code tools such as Terraform, Pulumi, and Crossplane.
- Collaborate with engineering teams to design reliable, scalable, and secure cloud infrastructure while supporting CI/CD pipelines and deployment strategies.
- Monitor production environments, define and improve SLIs/SLOs, implement observability solutions, and strengthen monitoring and alerting capabilities.
- Participate in incident response, troubleshoot production issues, conduct root cause analysis, and drive post-incident improvements.
- Establish and maintain cloud governance, security standards, compliance initiatives, and cost optimization strategies.
- Continuously reduce operational toil through automation and AI-assisted development practices while promoting Site Reliability Engineering principles across teams.
- Stay current with emerging technologies and contribute to the continuous evolution of platform engineering best practices.
Requirements
- Proven experience as a Site Reliability Engineer, Platform Engineer, DevOps Engineer, or in a similar cloud infrastructure role.
- Strong scripting and programming skills using Python, Go, Bash, or comparable languages.
- Hands-on experience with Kubernetes, Docker, cloud platforms (AWS, Azure, or GCP), and Infrastructure as Code solutions including Terraform, Pulumi, or Crossplane.
- Solid knowledge of CI/CD platforms such as GitHub Actions, Jenkins, or TeamCity.
- Experience with monitoring and observability technologies including Grafana, Prometheus, ELK, Tempo, or Loki.
- Understanding of Internal Developer Platforms (IDP), developer experience (DevEx), and platform engineering principles.
- Familiarity with cloud governance, security best practices, incident response, and ISO 27001 or similar compliance frameworks.
- Experience leveraging AI development tools such as GitHub Copilot or ChatGPT to improve engineering workflows is highly desirable.
- Strong analytical, troubleshooting, communication, and collaboration skills with experience working in Agile environments.
Benefits
- Competitive compensation package.
- Fully remote work opportunity within India.
- Flexible working hours supporting work-life balance.
- Comprehensive health insurance coverage.
- Generous vacation and paid leave benefits.
- Opportunity to work with modern cloud-native technologies and AI-powered engineering tools.
- Collaborative international environment with experienced technology professionals.
- Continuous learning, professional development, and career growth opportunities.
- Exposure to innovative platform engineering, cloud infrastructure, and large-scale reliability initiatives.