Senior Machine Learning Systems Engineer, Ads ML Experience Platform

Jobgether · US

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Machine Learning Systems Engineer, Ads ML Experience Platform based in the United States.

This role sits at the core of a large-scale machine learning ecosystem powering Ads ML development and experimentation. You will design and build next-generation infrastructure that accelerates the full ML lifecycle, from offline experimentation to production training, evaluation, and deployment. The environment is highly technical, fast-paced, and deeply collaborative, working closely with ML engineers, researchers, and platform teams. You will contribute to systems that enable reproducible research, scalable model iteration, and automated ML workflows. A key focus is advancing developer experience through robust tooling and intelligent automation. The role also explores emerging agentic AI systems that support autonomous and human-in-the-loop workflows. Your work will directly impact the speed, reliability, and scalability of ML innovation across a global platform such as Reddit.

Accountabilities:

In this role, you will lead the design and development of scalable ML infrastructure that powers experimentation, training, and deployment workflows across Ads ML systems.

  • Build and evolve large-scale offline ML experimentation platforms enabling reproducibility, evaluation, and model promotion workflows.
  • Develop distributed training orchestration systems supporting hyperparameter tuning, retraining, and evaluation pipelines.
  • Design infrastructure for experiment tracking, metadata management, lineage, artifact versioning, and model registries.
  • Create automated workflows for model promotion, rollback, compliance validation, and continuous monitoring.
  • Collaborate with ML engineers and researchers to improve experimentation velocity and platform efficiency.
  • Contribute to the design of agentic AI systems enabling multi-agent orchestration and intelligent workflow execution.
  • Ensure systems are reliable, scalable, and optimized for high-performance ML development at production scale.
  • Requirements:

    This role requires strong expertise in large-scale distributed systems and hands-on experience building production-grade ML platforms and infrastructure.

    • 5+ years in platform engineering, distributed systems, or large-scale infrastructure development.
    • 2+ years building production ML infrastructure, developer platforms, or AI tooling.
    • Strong experience with ML workflow orchestration and distributed data processing frameworks (e.g., Spark, Ray, Flink).
    • Hands-on experience with orchestration tools such as Airflow, Kubeflow, Argo, or equivalent systems.
    • Proven ability to build and maintain ML experimentation platforms, model registries, or training pipelines.
    • Strong programming skills in Python and familiarity with scalable software engineering practices.
    • Experience with cloud-based ML systems and production deployment environments.
    • Exposure to agentic AI systems, multi-agent workflows, or autonomous orchestration frameworks is a strong plus.
    • Excellent communication skills with the ability to translate technical complexity into clear insights for diverse stakeholders.
    • Benefits:

      • Competitive base salary with additional equity (RSUs) and potential bonus eligibility
      • Comprehensive medical, dental, and vision insurance coverage
      • 401(k) retirement plan with employer matching
      • Generous paid time off, including vacation, holidays, and parental leave
      • Equity participation in a high-growth, impact-driven engineering environment
      • Flexible work arrangements with remote eligibility across supported regions
      • Professional development opportunities in advanced ML systems and AI infrastructure
      • Inclusive, collaborative engineering culture focused on innovation and impact.

Data & ML pay context

Based on 1,583 disclosed Data & ML salaries on RoleSuite, the role pays a median of $162K/year, with most offers between $127K and $204K (10th–90th percentile: $105K–$246K).

See the full Data & ML salary breakdown →
Apply →