Role Description
We're looking for a Distributed Systems Software Engineer to join our Big Data infrastructure team. You'll contribute to the design and development of reliable, efficient data infrastructure that supports scalable data processing and analytics for internal and external customers.
Your Impact
- Build Data Processing & Analytics Services — Develop scalable services using our big data stack (Spark, Trino, Airflow, Kafka) to support real-time and batch data workflows.
- Contribute to Distributed Systems — Participate in the design, development, and operation of resilient distributed systems managing thousands of compute nodes across multiple data centers.
- Troubleshoot and Innovate — Resolve technical challenges and drive innovations that enhance system resilience, availability, and performance.
- Service Ownership & Maintenance — Contribute to the full service lifecycle, balancing live-site reliability, feature development, and technical debt reduction.
- On-Call Support — Join the team's on-call rotation to keep critical services operational and highly available.
Required Qualifications
- .Cloud Environments — Proficiency in AWS, GCP, or Azure; containerization (Docker, Kubernetes); and infrastructure-as-code (Terraform, Ansible).
- Big Data Technologies — Hands-on experience with Hadoop, Spark, Trino (or similar SQL query engines), Airflow, Kafka, and related ecosystems.
- Programming — Strong skills in Python, Java, Scala, or other languages relevant to distributed systems.
- Distributed Systems Knowledge — Solid understanding of distributed computing principles, data partitioning, fault tolerance, and performance tuning.
- Analytical & Problem-Solving Skills — Ability to troubleshoot system issues and optimize for efficiency and scale.