DataJobs
RoleSuite
CompaniesRemoteAboutMethodologyContactPrivacy
Updated 2026-06-10 01:00 UTC·© 2025–2026 RoleSuite
← Back to listings

Sr. AI Data Engineer

Jobgether · US

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr. AI Data Engineer based in United States.

This role operates at the intersection of data engineering and machine learning systems, building the foundational pipelines that power next-generation generative AI models. You will design and scale complex, AI-augmented data workflows that process billions of images and integrate model-driven enrichment at every stage. The position requires deep expertise in distributed systems, data pipelines, and ML inference orchestration in high-scale environments. You will work on systems that combine traditional SQL-based transformations with real-time model invocations, ensuring quality, reliability, and performance. A key focus of the role is enabling high-quality training datasets for image generation models, directly influencing model performance across multiple dimensions. You will collaborate closely with ML researchers and engineers in a fast-paced, research-driven environment. This is a highly technical and impactful role shaping the future of generative AI infrastructure.

Accountabilities:

  • Design and maintain large-scale, AI-augmented data pipelines that combine SQL transformations with ML model invocations for data cleaning, labeling, and enrichment.
  • Own end-to-end remote inference orchestration, including batching, asynchronous execution, retry logic, failure handling, and performance optimization.
  • Build and manage scalable embedding pipelines, including vector generation, storage, indexing, and similarity search infrastructure.
  • Curate and govern large-scale training datasets for image generation models using model-driven signals such as classifiers, aesthetic scoring, and content filters.
  • Develop automated annotation systems using LLMs and vision models, including evaluation frameworks to measure annotation quality and model performance.
  • Contribute to shared engineering frameworks and reusable tooling for AI-driven data workflows and pipeline orchestration.
  • Ensure pipeline reliability, compliance, and data quality across billions of records in distributed production systems.
  • Collaborate with ML researchers and engineers to improve dataset quality, evaluation metrics, and generative model performance.
  • Requirements:

    • Bachelor’s degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
    • 5+ years of experience in data engineering, ML engineering, or hybrid roles involving data pipelines and model inference systems.
    • Strong expertise in SQL, data pipeline orchestration tools (e.g., Airflow, Dataswarm), and large-scale distributed systems.
    • Hands-on experience integrating ML models into production pipelines, including inference APIs, batching, and failure handling.
    • Experience with AI-assisted development tools (e.g., Copilot, Cursor, Codex) to accelerate engineering workflows.
    • Strong programming and debugging skills with a focus on scalable data systems and production reliability.
    • Experience with embeddings, vector databases, or similarity search systems (e.g., FAISS, Milvus) is highly desirable.
    • Familiarity with content understanding models such as classifiers, OCR, object detection, and NSFW filtering.
    • Exposure to LLM-based workflows for data annotation, cleaning, or evaluation is strongly preferred.
    • Knowledge of generative AI concepts such as diffusion models, CLIP scores, and image quality evaluation metrics is a plus.
    • Strong communication and collaboration skills in cross-functional technical environments.
    • Benefits:

      • Competitive annual compensation ranging from $105,000 – $110,000.
      • Opportunity to work on cutting-edge generative AI infrastructure at massive scale.
      • Exposure to advanced ML systems, embeddings, and large-scale model orchestration pipelines.
      • Collaborative environment working closely with research and engineering teams.
      • Remote flexibility not included; onsite collaboration in a high-performance engineering environment.
      • Eligibility for standard contractor or temp employee benefits (medical, dental, vision, 401(k), holidays) depending on employment classification and hours.
      • Opportunity to contribute directly to the development of next-generation image generation models.
Apply →

Other roles at Jobgether

  • Sr. Managed Care Network ManagerUS
  • Senior Director, Public Affairs Oncology AdvocacyUS
  • Manager, Pipeline Creation and Digital Sales PerformanceUS
  • Senior Campaign ManagerUS
  • Oracle Applications Solutions ArchitectUS
  • Director, Payment IntegrityUS
  • Executive Director, Pricing & ContractingUS
  • Executive Director, Site Payment ServicesUS
  • Chief ScientistUS
  • Customer Financial Specialist AssociateUS

More Data & ML roles

  • Senior Manager, Vx Data Governance and EnablementPfizer · India - Mumbai
  • Finance Data EngineerApple · Cupertino
  • Data EngineerApple · Cupertino
  • Finance Data EngineerApple · Cupertino
  • Principal Applied Scientist UiPath · Bellevue
  • Generative AI Applied Scientist, SIML - ISEApple · Cupertino
  • Systems Engineer - Evaluation EngineeringApple · Cupertino
  • Senior Analytics Engineer, GTMPure Storage · Santa Clara, California
  • Senior Product Data ScientistMaintainX · Montreal, Toronto, Vancouver, SF (Remote)
  • Applied Research Scientist, Proactive Intelligence — Personal and Agentic SearchApple · Cupertino