AIEngJobs
RoleSuite
CompaniesRemoteAboutMethodologyContactPrivacy
Updated 2026-07-03 17:00 UTC·© 2025–2026 RoleSuite
← Back to listings

MLOps Engineer

Jobgether · US

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a MLOps Engineer based in the United States.

This role is focused on building and operating high-performance machine learning inference platforms that support large-scale, production AI systems.
You will be responsible for ensuring that model serving infrastructure is reliable, scalable, and optimized for latency, throughput, and cost efficiency.
The position sits at the intersection of machine learning and distributed systems engineering, with a strong emphasis on production-grade performance.
You will design systems that handle complex workloads such as LLMs, vision models, and recommendation engines across cloud-native environments.
The environment is highly technical, fast-moving, and deeply focused on engineering excellence and observability.
You will work closely with ML researchers, product teams, and infrastructure engineers to bring cutting-edge models into production.
This is a hands-on role where your work directly impacts AI product performance, scalability, and user experience at scale.

Accountabilities:

  • Design, build, and operate scalable model serving platforms for LLMs, vision models, and recommendation systems.
  • Optimize inference performance using techniques such as batching, caching, speculative decoding, and request routing strategies.
  • Implement multi-tenant serving architectures with rate limiting, QoS policies, and traffic management controls.
  • Develop autoscaling and capacity planning systems to balance latency, cost, and throughput across workloads.
  • Improve GPU utilization and memory efficiency for high-performance inference workloads.
  • Integrate model serving systems with APIs, identity services, and observability platforms.
  • Build and enhance observability frameworks covering latency, GPU metrics, error tracking, and system health.
  • Support deployment pipelines including canary releases, shadow testing, and rollback mechanisms.
  • Participate in incident response for production AI services and drive long-term reliability improvements.
  • Collaborate with ML and product teams to support model releases and production rollouts.
  • Implement security and abuse prevention controls at the serving layer.
  • Document system behavior, operational procedures, and performance tuning best practices.
  • Requirements:

    • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
    • 6+ years of experience in distributed systems, infrastructure engineering, or ML platform development.
    • Strong proficiency in Python and a systems programming language such as Go, Rust, or C++.
    • Experience building and operating high-throughput, low-latency production systems.
    • Hands-on experience with LLM inference frameworks such as vLLM, TensorRT-LLM, or similar.
    • Strong understanding of GPU architecture, memory management, and performance optimization.
    • Experience with Kubernetes, cloud platforms, and autoscaling infrastructure.
    • Strong knowledge of observability tools including metrics, logging, and distributed tracing systems.
    • Solid understanding of performance engineering, capacity planning, and distributed system design.
    • Strong communication and incident response skills in production environments.
    • Experience with AI model serving at scale, multi-region systems, or FinOps optimization is a plus.
    • Benefits:

      • Competitive salary range of $100,000 – $150,000 annually.
      • 100% remote position within the United States.
      • Full-time W2 employment with long-term stability.
      • Opportunity to work on cutting-edge AI inference and LLM serving systems.
      • Exposure to advanced GPU optimization and large-scale distributed AI infrastructure.
      • Career growth through ownership of production AI platforms and architecture decisions.
      • Inclusive and equal opportunity workplace culture.

AI Engineering pay context

Based on 579 disclosed AI Engineering salaries on RoleSuite, the role pays a median of $206K/year, with most offers between $167K and $245K (10th–90th percentile: $131K–$277K).

This posting lists $100K–$150K, below the $206K market median.

See the full AI Engineering salary breakdown →
Apply →

Other roles at Jobgether

  • Senior ML Operations (MLOps) EngineerUS
  • Security Engineer, Product SecurityUS
  • Sr. Manager, Growth MarketingUS
  • Director of Data Engineering, HealthcareUS
  • VP, Payor PartnershipsUS
  • Telecom Observability EngineerUS
  • Assistant Controller, People LeaderUS
  • Director, Policy and Business DevelopmentUS
  • Technical Representative NSWAustralia
  • Sr. People Operations ManagerUS

More AI Engineering roles

  • Machine Learning Engineer - IV (Fraud)Jumio · Bangalore
  • Machine Learning EngineerHire Hangar · Columbia - Bogotá
  • AI EngineerHire Hangar · Columbia - Bogotá
  • Full-Stack AI EngineerHire Hangar · Ukraine - Kyiv
  • Senior AI Platform EngineerHire Hangar · Poland - Kraków
  • [Data - FR] Senior Machine Learning Engineer - OrchestrationDoctolib · Paris, Paris, France
  • Staff Machine Learning EngineerPPRO · Sao Paulo
  • Senior Forward Deployed Engineer, GenAI, Google Cloud (Japanese, English)Google · Tokyo, Japan
  • AI EngineerStripe · Chicago
  • [Job-30309] Ai Engineer, UKCiandt · London