This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a MLOps Lead based in Netherlands.
As an MLOps Lead, you will shape the strategy, architecture, and operational excellence of a cutting-edge machine learning infrastructure supporting large-scale AI systems. Leading a team of MLOps engineers, you will bridge the gap between research and production, ensuring that machine learning models are deployed, monitored, and scaled efficiently in high-performance environments. This role combines technical leadership with hands-on architectural decision-making, offering the opportunity to build robust infrastructure from the ground up while collaborating closely with engineering, research, and product teams. Working in a fully remote, international environment, you will help establish best practices and drive innovation across the entire machine learning lifecycle, enabling the delivery of reliable and scalable AI solutions.
Accountabilities
- Lead, mentor, and develop a high-performing team of MLOps engineers while fostering a culture of collaboration, technical excellence, and continuous improvement.
- Define and execute the MLOps roadmap, aligning infrastructure initiatives with research, engineering, and product objectives.
- Design, implement, and maintain scalable machine learning infrastructure, including automated training pipelines, CI/CD workflows, orchestration frameworks, and deployment processes.
- Drive architectural decisions for model serving platforms, ensuring low-latency, high-throughput inference using modern serving technologies.
- Build and optimize feature stores, data pipelines, and storage solutions that support large-scale model training and production inference.
- Collaborate closely with research teams to streamline the transition of machine learning models from experimentation to production environments.
- Establish monitoring, logging, alerting, and observability strategies to ensure model performance, system reliability, and early detection of drift or operational issues.
- Define engineering standards, operational best practices, and scalable infrastructure processes that support long-term platform growth.
Requirements
- Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
- Minimum of 7 years of experience in MLOps or machine learning infrastructure engineering, including at least 3 years in a technical leadership role.
- Strong software engineering expertise in Python, with working knowledge of Bash and/or Go.
- Proven experience building, scaling, and leading MLOps infrastructure from the ground up.
- Deep knowledge of machine learning platforms and frameworks such as MLflow, Weights & Biases (W&B), PyTorch, and TensorFlow.
- Extensive experience with model serving technologies including Triton Inference Server, TorchServe, TensorFlow Serving, or KServe.
- Hands-on expertise with Kubernetes, cloud platforms (AWS, GCP, or Azure), infrastructure as code tools (Terraform, Helm, GitOps), and production-grade data pipelines.
- Strong experience with monitoring and observability solutions such as Prometheus, Grafana, Datadog, and OpenTelemetry.
- Excellent communication skills with the ability to collaborate effectively across research and engineering teams.
- Experience with workflow orchestration tools, FastAPI, Databricks, Snowflake, LLM infrastructure, SRE practices, or AI startup environments is considered an advantage.
Benefits
- Competitive compensation package including salary and equity participation.
- Comprehensive healthcare coverage for employees and eligible dependents.
- Generous paid parental leave supporting biological, adoptive, and surrogate parenthood.
- Relocation assistance for employees joining one of the company's office locations, where applicable.
- Fully remote work environment with international collaboration opportunities.
- Opportunity to lead cutting-edge AI infrastructure initiatives with significant technical ownership.
- Inclusive, mission-driven culture that values innovation, collaboration, diversity of thought, and continuous learning.