Senior Site Reliability Engineer (SRE)
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Site Reliability Engineer (SRE) based in Brazil.
This role is centered on ensuring the reliability, scalability, and performance of large-scale distributed systems that support millions of users across multiple products and markets. You will work at the intersection of software engineering and infrastructure, designing and operating platforms that enable fast, safe, and efficient product delivery. The position requires a strong problem-solving mindset, combining pragmatism and technical excellence to balance speed, stability, and cost efficiency. You will collaborate closely with software engineering teams to improve system resilience, automate operational processes, and elevate the overall developer experience. A key part of your mission will be to promote a strong DevOps and “you build it, you run it” culture across teams. This is a high-impact role in a fast-moving, product-driven environment where engineering decisions directly affect user experience at scale.
Accountabilities:
- Design, build, and operate scalable and reliable shared infrastructure platforms supporting product growth and international expansion.
- Analyze complex technical problems and deliver pragmatic, end-to-end engineering solutions.
- Collaborate with multiple engineering teams to ensure systems are resilient, efficient, and production-ready.
- Improve developer experience by treating the platform as a product, focusing on automation, simplification, and faster delivery cycles.
- Enable and coach software engineers in DevOps and SRE best practices to support autonomous “you build it, you run it” teams.
- Ensure platform security, compliance, reliability, and cost optimization across systems and services.
- Monitor system performance, proactively identify risks, and implement reliability improvements.
- Stay updated on emerging technologies and assess their applicability to platform evolution.
- Participate in code reviews, providing constructive feedback and supporting engineering quality standards.
- Mentor engineers and contribute to technical growth across teams.
- Strong experience as a Site Reliability Engineer or in similar infrastructure/platform engineering roles.
- Solid understanding of distributed systems, scalability, reliability, and performance engineering.
- Experience designing and operating cloud infrastructure (AWS preferred).
- Proficiency with infrastructure as code tools such as Terraform and Kubernetes.
- Experience with observability tools and practices (monitoring, logging, alerting, tracing).
- Strong software engineering skills in at least one programming language (e.g., Python, Go, Java, Node.js, or similar).
- Experience working in high-growth, fast-paced, product-oriented environments.
- Ability to design simple, scalable, and maintainable architectures.
- Strong communication skills and ability to collaborate across multidisciplinary teams.
- Nice to have: experience with CI/CD, cost optimization, platform security, or developer experience initiatives.
- Competitive salary and benefits package.
- Health and dental insurance.
- Flexible and remote-friendly work environment.
- Meal and mobility allowances.
- Mental health and well-being support programs.
- Learning and professional development opportunities.
- International and collaborative engineering environment.
- Exposure to large-scale, high-impact distributed systems.
Requirements:
Benefits:
DevOps pay context
Based on 1,094 disclosed DevOps salaries on RoleSuite, the role pays a median of $142K/year, with most offers between $115K and $176K (10th–90th percentile: $99K–$210K).
See the full DevOps salary breakdown →