This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a ML Infrastructure Engineer based in United States.
This role focuses on building and operating the core platform that powers large-scale machine learning training and inference workloads.
You will work on GPU cluster infrastructure spanning cloud, on-prem, and hybrid environments.
The position plays a critical role in enabling efficient, reliable, and scalable AI development across multiple teams.
You will design systems for scheduling, distributed training, storage throughput, and high-performance networking.
The environment is highly technical, combining systems engineering, ML frameworks, and platform reliability at scale.
You will collaborate closely with ML researchers and engineers to optimize performance, cost, and developer experience.
This is a hands-on engineering role where impact is measured by infrastructure efficiency and production readiness of AI workloads.
Based on 1,222 disclosed DevOps salaries on RoleSuite, the role pays a median of $140K/year, with most offers between $115K and $173K (10th–90th percentile: $100K–$208K).
This posting lists $100K–$150K, below the $140K market median.
See the full DevOps salary breakdown →