Senior Platform/MLOps Engineer
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Platform/MLOps Engineer based in the United States.
This role sits at the core of building the infrastructure that powers next-generation AI-driven manufacturing systems. You will design and scale MLOps and platform capabilities that support computer vision, deep learning, and robotics workloads deployed in real-world factory environments. The work directly enables high-precision automation for defect detection, classification, and visual inspection at industrial scale. You will operate across platform engineering, machine learning infrastructure, and distributed systems, ensuring models move reliably from training to production. This is a hands-on, high-impact engineering role where you will collaborate closely with robotics, AI, and platform teams. The environment is fast-moving, deeply technical, and focused on building resilient systems that perform under demanding production constraints. You will help shape the foundation of a software-defined manufacturing platform.
Accountabilities
You will be responsible for building and evolving the infrastructure that powers scalable ML and platform systems in production environments, with a strong focus on reliability, performance, and developer productivity. You will design and maintain end-to-end MLOps pipelines that support training, deployment, and monitoring of AI models used in computer vision and robotics applications.
- Design, implement, and maintain scalable ML/AI infrastructure, including training pipelines, model deployment systems, and inference services
- Build and optimize GPU-enabled workloads running in Kubernetes environments for high-performance AI applications
- Develop robust CI/CD and GitOps workflows to support continuous delivery of machine learning and platform services
- Collaborate with cross-functional teams to define architecture, evaluate technical tradeoffs, and prototype new platform capabilities
- Improve system reliability through observability tooling, incident response practices, and performance optimization
- Work closely with applied AI and robotics teams to ensure infrastructure meets real-world production needs
- Produce high-quality documentation and contribute to engineering best practices across platform teams
- 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering
- Strong programming skills in Python, Go, JavaScript, C#, or similar languages
- Proven experience designing and operating MLOps pipelines in production environments
- Deep knowledge of Kubernetes (including CNCF ecosystem, managed and self-hosted environments)
- Experience running and optimizing GPU workloads in Kubernetes clusters
- Hands-on expertise with Infrastructure as Code tools such as Terraform and configuration management tools like Ansible
- Experience with CI/CD pipelines and GitOps-based delivery workflows
- Familiarity with observability tools such as Prometheus, Grafana, and OpenTelemetry
- Strong understanding of software engineering best practices across the SDLC
- Ability to collaborate across teams, translate requirements into system design, and communicate technical concepts clearly
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
- Competitive base salary and performance-based compensation
- Comprehensive medical, dental, and vision insurance coverage
- Flexible work arrangements depending on role requirements
- Opportunity to work on cutting-edge AI, robotics, and industrial automation systems
- Career growth in a high-impact, innovation-driven engineering environment
- Exposure to large-scale GPU infrastructure and advanced ML systems
- Collaborative, cross-functional engineering culture focused on learning and ownership
Requirements
This role requires strong experience in platform engineering, DevOps, or SRE environments, combined with hands-on exposure to modern MLOps practices and production-grade ML systems. You should be comfortable working across infrastructure, application code, and distributed systems, with a strong focus on scalability and reliability.
Preferred experience includes working in highly secure or air-gapped environments, mentoring engineers, and contributing to architectural decisions in complex distributed systems.
Benefits
AI Engineering pay context
Based on 643 disclosed AI Engineering salaries on RoleSuite, the role pays a median of $201K/year, with most offers between $162K and $246K (10th–90th percentile: $130K–$285K).
See the full AI Engineering salary breakdown →