Senior ML Operations (MLOps) Engineer

Jobgether · US

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior ML Operations (MLOps) Engineer based in United States.

This role sits at the intersection of machine learning, distributed systems, and production-grade infrastructure, powering real-world ML models that directly influence user experience at scale. You will help design and operate the systems that deploy, monitor, and continuously improve ML models embedded in connected hardware products used by customers worldwide. Working in a fast-paced, product-driven environment, you will collaborate closely with research, firmware, backend, and data teams to bring advanced ML capabilities into production with reliability and efficiency. The role requires strong ownership across the full ML lifecycle—from experimentation and training pipelines to deployment and observability in production. You will have direct impact on systems that operate on a large fleet of connected devices, where performance, latency, and reliability are critical. This is a high-autonomy engineering role where you will shape infrastructure decisions and influence how ML is delivered at scale. The environment is highly collaborative, technically ambitious, and centered on real-world impact through applied machine learning.

Accountabilities:

In this role, you will be responsible for building, scaling, and operating end-to-end MLOps infrastructure that enables reliable deployment and monitoring of machine learning models in production environments. You will ensure ML systems are efficient, observable, and continuously improving through strong engineering and automation practices.

Design, build, and maintain scalable ML infrastructure, including data pipelines, training workflows, and model deployment systems
Own end-to-end ML lifecycle operations, ensuring reliable delivery of models into production environments at scale
Develop and optimize CI/CD pipelines for machine learning workflows, enabling rapid and safe iteration
Implement monitoring, telemetry, and feedback loops for ML models running across large-scale device fleets
Collaborate with R&D, firmware, backend, and data teams to ensure seamless integration of ML inference systems
Build tooling, microservices, and frameworks to improve experimentation, data processing, and deployment efficiency
Optimize compute, storage, and infrastructure costs while maintaining high performance and reliability
Ensure strong observability and system health across all ML production services

Requirements

This role requires strong experience in ML infrastructure engineering, with a focus on building and scaling production-grade systems in cloud environments. You should be comfortable working across the full stack of ML systems, from data pipelines to deployment and monitoring.

5+ years of software engineering experience with a focus on ML infrastructure, distributed systems, or large-scale data processing
Strong proficiency in Python and ML frameworks such as PyTorch, TensorFlow, or equivalent
Hands-on experience with MLOps workflows, including model training pipelines, orchestration, and CI/CD deployment systems
Proven track record of deploying ML models into production at scale with monitoring and feedback systems
Strong experience with cloud platforms (AWS preferred), including services for compute, storage, and observability
Familiarity with distributed systems, streaming data, and large-scale data processing architectures
Strong understanding of system performance optimization, including latency, cost, and scalability trade-offs
Experience working in cross-functional teams in fast-paced, product-driven environments
Strong communication skills and ability to collaborate effectively in remote settings
Bonus: experience in IoT, wearable, or health-related ML systems

Benefits

Competitive compensation with meaningful equity participation
Equity refresh opportunities tied to performance and company growth
Comprehensive health, dental, and vision insurance for employees and dependents
Supplemental life insurance coverage
Flexible paid time off policy
Paid parental leave
Commuter benefits (where applicable)
Access to flagship company product as part of employee perks
Opportunity to work on high-impact ML systems deployed to real-world consumer hardware at scale.

Software pay context

Based on 7,881 disclosed Software salaries on RoleSuite, the role pays a median of $158K/year, with most offers between $124K and $200K (10th–90th percentile: $102K–$235K).

See the full Software salary breakdown →

Apply →

Senior ML Operations (MLOps) Engineer

Accountabilities:

Requirements

Benefits

Software pay context

Other roles at Jobgether

More Software roles