Senior Site Reliability Engineer (SRE)

Jobgether · Mexico

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Site Reliability Engineer (SRE) based in Mexico.

This role is focused on ensuring the reliability, scalability, and operational excellence of mission-critical production systems in a high-performance, distributed environment.
You will take ownership of system stability by defining and evolving SLOs, SLIs, and error budgets that directly guide engineering and operational decisions.
The position plays a central role in shaping observability strategies, improving incident response maturity, and strengthening on-call practices across engineering teams.
You will lead the response to production incidents, coordinating cross-functional teams to restore services quickly and effectively.
A key part of the role involves building automation and reliability tooling to reduce operational overhead and improve system resilience.
You will work closely with software engineering teams to ensure production readiness, scalability, and disaster recovery standards are consistently met.
This is a high-impact SRE role where your work directly influences uptime, performance, and user experience at scale.

Accountabilities:

Define, implement, and continuously improve SLIs, SLOs, and error budgets to measure and enhance system reliability across production environments.
Own and evolve observability practices, including monitoring, logging, tracing, and alerting strategies to ensure full system visibility.
Lead incident response efforts as Incident Commander during production outages, coordinating resolution across engineering teams.
Design, maintain, and optimize on-call systems, including escalation policies, runbooks, alert tuning, and operational workflows.
Drive blameless postmortems and ensure follow-through on corrective actions to prevent recurrence of production issues.
Collaborate with engineering teams on production readiness, capacity planning, scalability, and disaster recovery initiatives.
Automate operational tasks and reliability processes using software engineering practices to improve system efficiency and resilience.

Requirements:

5+ years of experience in Site Reliability Engineering, Production Engineering, or similar roles supporting high-availability systems.
Strong hands-on experience defining and managing SLOs, SLIs, and error budgets in production environments.
Proven experience leading incident response and acting as Incident Commander during critical production outages.
Deep expertise in observability tools and practices, including monitoring, logging, alerting, and distributed tracing.
Strong software engineering skills in Python, Go, or TypeScript, with a focus on automation and reliability engineering.
Experience working with cloud environments (AWS or similar) and supporting mission-critical systems at scale.
Demonstrated ability to improve on-call processes, reduce alert noise, and build effective operational frameworks.
Experience conducting blameless postmortems and driving long-term reliability improvements.
Nice to have: experience with Kubernetes, Datadog, Heroku, PostgreSQL or SQL Server, regulated industries, or SRE practice maturity initiatives.

Benefits:

Fully remote work model (home office)
Competitive compensation aligned with experience
International projects with global teams and clients
Structured career growth and development opportunities
English learning program (technical and conversational)
Fitness and wellness support program (TotalPass)
Gamification initiatives, games, and internal competitions
Exposure to modern SRE practices and high-scale systems
Collaborative and growth-oriented engineering culture

DevOps pay context

Based on 1,258 disclosed DevOps salaries on RoleSuite, the role pays a median of $140K/year, with most offers between $115K and $173K (10th–90th percentile: $99K–$210K).

See the full DevOps salary breakdown →

Apply →

Senior Site Reliability Engineer (SRE)

Accountabilities:

DevOps pay context

Other roles at Jobgether

More DevOps roles