Lead AI Engineer

Salesforce · Mexico City, Ciudad de Mexico, Mexico

Lead AI Engineer (Mexico City) Data Solutions Org

Hybrid

We are looking for a Lead AI Engineer to drive the development of next-generation AI and ML systems at Salesforce.

This role owns the design and evolution of intelligent decisioning systems and expands into building a broader agent flywheel (a system of self-improving feedback loops that continuously evaluate, optimize, and evolve agent performance).

This role sits on the applied side but requires strong data and systems engineering depth — you will build not just models and agents, but the data pipelines, evaluation loops, and lightweight system scaffolding that allow them to continuously improve in production.

You will build production-grade ML models, embed them into agent workflows, and define how agents learn from real-world outcomes. This is a hands-on, high-impact role focused on shipping systems that directly influence agent performance, efficiency, revenue, and customer experience.

What You’ll Do

1) Build the Agent Flywheel

  • Design and implement feedback loops that enable agents and ML models to self-improve over time

  • Develop systems for:

    • Outcome tracking (e.g., engagement, conversions, resolution quality)

    • Agent evaluation (LLM + deterministic + human-in-the-loop signals)

    • Iterative optimization (prompting, policies, model selection, fine-tuning)

  • Build pipelines that collect and structure agent traces (inputs, tool usage, intermediate steps, outputs) into high-quality training and evaluation datasets

  • Close the loop from production signals → evaluation → model/prompt improvements

2) Develop Production ML & Agent Systems

  • Build and deploy application-specific ML models (classification, ranking, forecasting, recommendation, etc.)

  • Design and implement AI agents that combine:

    • LLM reasoning

    • Tool/API usage

    • ML-based decisioning layers

  • Implement reusable agent patterns (multi-step reasoning, tool orchestration, structured outputs) within application workflows

  • Integrate ML and agent capabilities into decisioning systems that drive business outcomes

3) Data & Pipeline Engineering

  • Design and build scalable data pipelines (batch and near real-time) that power training, evaluation, and inference workflows

  • Develop pipelines that transform raw interaction data into features, labels, and evaluation datasets

  • Partner model pipelines with data pipelines to enable continuous retraining and evaluation loops

  • Ensure data quality, consistency, and availability across systems

  • Work with large-scale structured and unstructured data to support both ML and LLM systems

4) Evaluation, Experimentation & Optimization

  • Build offline and online evaluation frameworks for agent and ML model performance

  • Develop evaluation datasets, golden traces, and regression-style test sets for agent behavior

  • Design and run A/B experiments to measure impact on business outcomes

  • Define and monitor key metrics (quality, containment, revenue impact, latency, etc.)

  • Use production traces and evaluation signals to drive continuous optimization (prompting, model selection, feature improvements, fine-tuning)

5) Architecture & Applied Systems Design

  • Develop hybrid systems that blend:

    • Deterministic logic

    • Model-based scoring

    • LLM-driven generation

  • Collaborate with platform teams to leverage shared infrastructure (model serving, evaluation tooling, observability), while building application-specific layers on top

  • Design systems that scale with increasing agent complexity and data volume

6) Platform & API Development

  • Build scalable Python services and APIs powering agent workflows

  • Contribute to shared infrastructure for model serving, evaluation, and experimentation

  • Ensure reliability, observability, and performance of deployed systems

Qualifications

Core Requirements

  • 6+ years of experience in AI/ML engineering, applied data science, or closely related roles

  • Strong hands-on experience in Python for production systems

  • Proven track record building and deploying production-grade ML models

  • Strong experience with data pipeline development (ETL/ELT, batch or streaming)

  • Experience designing and building AI agents or agent-like systems

  • Strong experience with API development and backend services

  • Experience with ML lifecycle tooling (training, evaluation, deployment, monitoring)

Data & Systems Expertise

  • Experience building reliable data pipelines that support ML or AI systems in production

  • Familiarity with:

    • Data processing frameworks (e.g., Spark or equivalent)

    • Data orchestration tools (e.g., Airflow, Dagster, etc.)

    • Data warehousing solutions (e.g., Snowflake, BigQuery, etc.)

  • Understanding of data quality, lineage, and reproducibility in ML systems

Agent & LLM Experience

  • Experience building or working with LLM-powered systems (prompting, orchestration, evaluation)

  • Familiarity with agent frameworks and tool-using agents

  • Experience working with agent traces, evaluation datasets, or iterative improvement loops is strongly preferred

Modeling & Systems Thinking

  • Strong understanding of:

    • Supervised learning (classification, regression, ranking)

    • Evaluation methodologies (offline + online)

    • Experimentation (A/B testing, causal inference basics)

  • Ability to design systems that combine:

    • ML models

    • LLMs

    • Business logic

Engineering & Production Skills

  • Experience deploying models/services in production environments

  • Familiarity with:

    • Model serving architectures

    • Data pipelines

    • Monitoring and observability

  • Ability to write clean, scalable, maintainable code

Preferred Qualifications

  • Experience building model-driven agent improvement systems (e.g., scoring, gating, auto-optimization)

  • Experience with reinforcement learning, bandits, or iterative optimization systems

  • Exposure to agent evaluation tools (e.g., LangSmith, Braintrust, or similar concepts)

  • Experience with large-scale experimentation platforms

  • Familiarity with enterprise SaaS or CRM domains

What Success Looks Like

  • Agents and production-grade ML models measurably improve over time via automated feedback loops

  • Well-structured data and evaluation pipelines continuously feeding the agent flywheel

  • Clear lift in key business metrics (e.g., engagement, conversion, revenue impact)

  • Robust evaluation systems that enable rapid iteration and safe deployment

Apply →