This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a SRE / Network Engineer (MAAS) based in Brazil.
This role is centered on designing, operating, and automating large-scale bare-metal and cloud-adjacent infrastructure in a highly distributed environment. You will work at the core of a decentralized compute platform focused on performance, efficiency, and sovereignty over cloud resources. The position combines deep systems engineering, networking expertise, and infrastructure automation, with a strong emphasis on Metal-as-a-Service (MAAS) environments. You will be responsible for ensuring reliability across hundreds of nodes spanning multiple sites while building the tooling needed to scale operations efficiently. The environment is fast-paced and highly autonomous, requiring strong ownership and problem-solving skills. Your work will directly influence the stability, scalability, and evolution of next-generation cloud infrastructure.
Accountabilities:
- Operate and maintain large-scale Linux-based infrastructure (Debian/Ubuntu), ensuring reliability and performance across distributed systems.
- Manage bare-metal systems at hardware level, including BIOS configurations, IPMI, RAID setups, and diagnostic troubleshooting.
- Design, implement, and maintain scalable network architectures using VLANs, L2/L3 routing, VPNs, and enterprise-grade networking equipment.
- Automate infrastructure provisioning and operations using Ansible, Bash, Python, and Git-based workflows to support Infrastructure-as-Code practices.
- Build and maintain MAAS-based provisioning workflows, including PXE booting, Preseed/Cloud-init automation, and OS deployment pipelines.
- Implement and manage observability stacks using tools such as Prometheus, Grafana, ELK/Graylog, or Loki for metrics, logs, and system insights.
- Develop internal tooling and APIs for compute and GPU resource tracking, infrastructure monitoring, and system integrations.
- Deploy and support virtualization and containerization platforms such as OpenStack, Proxmox VE, VMware ESXi, and container orchestration systems.
Requirements:
- Strong expertise in Linux system administration, particularly Debian and Ubuntu environments.
- Hands-on experience with MAAS, Ironic, or other bare-metal provisioning and automation systems.
- Solid understanding of networking fundamentals, including VLANs, routing, VPNs, and multi-site infrastructure design.
- Proven experience with Infrastructure-as-Code tools such as Ansible and scripting languages like Bash and Python.
- Familiarity with observability and monitoring stacks including Prometheus, Grafana, ELK/Graylog, or Loki.
- Experience with automated deployment workflows (PXE, Preseed, Cloud-init) and infrastructure automation pipelines.
- Background in virtualization and orchestration technologies such as OpenStack, Proxmox, VMware, or Kubernetes-based environments.
- Ability to work autonomously in fast-paced, high-growth or startup-like environments with strong problem-solving skills.
Benefits:
- Competitive compensation aligned with experience and market benchmarks
- Fully remote role with flexibility across LATAM / global distributed teams
- Opportunity to work on cutting-edge decentralized cloud infrastructure
- High-impact engineering role with ownership over production-scale systems
- Exposure to advanced bare-metal, networking, and cloud orchestration technologies
- Fast-paced, autonomy-driven environment with strong technical ownership
- Continuous learning opportunities in large-scale infrastructure and distributed systems
- Inclusive and collaborative engineering culture focused on innovation and reliability