Lead AI Engineer

Salesforce · Mexico City, Ciudad de Mexico, Mexico

Lead AI Engineer (Mexico City) Data Solutions Org

Hybrid

We are looking for a Lead AI Engineer to drive the development of next-generation AI and ML systems at Salesforce.

This role owns the design and evolution of intelligent decisioning systems and expands into building a broader agent flywheel (a system of self-improving feedback loops that continuously evaluate, optimize, and evolve agent performance).

This role sits on the applied side but requires strong data and systems engineering depth — you will build not just models and agents, but the data pipelines, evaluation loops, and lightweight system scaffolding that allow them to continuously improve in production.

You will build production-grade ML models, embed them into agent workflows, and define how agents learn from real-world outcomes. This is a hands-on, high-impact role focused on shipping systems that directly influence agent performance, efficiency, revenue, and customer experience.

What You’ll Do

1) Build the Agent Flywheel

Design and implement feedback loops that enable agents and ML models to self-improve over time
Develop systems for:
- Outcome tracking (e.g., engagement, conversions, resolution quality)
- Agent evaluation (LLM + deterministic + human-in-the-loop signals)
- Iterative optimization (prompting, policies, model selection, fine-tuning)
Build pipelines that collect and structure agent traces (inputs, tool usage, intermediate steps, outputs) into high-quality training and evaluation datasets
Close the loop from production signals → evaluation → model/prompt improvements

2) Develop Production ML & Agent Systems

Build and deploy application-specific ML models (classification, ranking, forecasting, recommendation, etc.)
Design and implement AI agents that combine:
- LLM reasoning
- Tool/API usage
- ML-based decisioning layers
Implement reusable agent patterns (multi-step reasoning, tool orchestration, structured outputs) within application workflows
Integrate ML and agent capabilities into decisioning systems that drive business outcomes

3) Data & Pipeline Engineering

Design and build scalable data pipelines (batch and near real-time) that power training, evaluation, and inference workflows
Develop pipelines that transform raw interaction data into features, labels, and evaluation datasets
Partner model pipelines with data pipelines to enable continuous retraining and evaluation loops
Ensure data quality, consistency, and availability across systems
Work with large-scale structured and unstructured data to support both ML and LLM systems

4) Evaluation, Experimentation & Optimization

Build offline and online evaluation frameworks for agent and ML model performance
Develop evaluation datasets, golden traces, and regression-style test sets for agent behavior
Design and run A/B experiments to measure impact on business outcomes
Define and monitor key metrics (quality, containment, revenue impact, latency, etc.)
Use production traces and evaluation signals to drive continuous optimization (prompting, model selection, feature improvements, fine-tuning)

5) Architecture & Applied Systems Design

Develop hybrid systems that blend:
- Deterministic logic
- Model-based scoring
- LLM-driven generation
Collaborate with platform teams to leverage shared infrastructure (model serving, evaluation tooling, observability), while building application-specific layers on top
Design systems that scale with increasing agent complexity and data volume

6) Platform & API Development

Build scalable Python services and APIs powering agent workflows
Contribute to shared infrastructure for model serving, evaluation, and experimentation
Ensure reliability, observability, and performance of deployed systems

Qualifications

Core Requirements

6+ years of experience in AI/ML engineering, applied data science, or closely related roles
Strong hands-on experience in Python for production systems
Proven track record building and deploying production-grade ML models
Strong experience with data pipeline development (ETL/ELT, batch or streaming)
Experience designing and building AI agents or agent-like systems
Strong experience with API development and backend services
Experience with ML lifecycle tooling (training, evaluation, deployment, monitoring)

Data & Systems Expertise

Experience building reliable data pipelines that support ML or AI systems in production
Familiarity with:
- Data processing frameworks (e.g., Spark or equivalent)
- Data orchestration tools (e.g., Airflow, Dagster, etc.)
- Data warehousing solutions (e.g., Snowflake, BigQuery, etc.)
Understanding of data quality, lineage, and reproducibility in ML systems

Agent & LLM Experience

Experience building or working with LLM-powered systems (prompting, orchestration, evaluation)
Familiarity with agent frameworks and tool-using agents
Experience working with agent traces, evaluation datasets, or iterative improvement loops is strongly preferred

Modeling & Systems Thinking

Strong understanding of:
- Supervised learning (classification, regression, ranking)
- Evaluation methodologies (offline + online)
- Experimentation (A/B testing, causal inference basics)
Ability to design systems that combine:
- ML models
- LLMs
- Business logic

Engineering & Production Skills

Experience deploying models/services in production environments
Familiarity with:
- Model serving architectures
- Data pipelines
- Monitoring and observability
Ability to write clean, scalable, maintainable code

Preferred Qualifications

Experience building model-driven agent improvement systems (e.g., scoring, gating, auto-optimization)
Experience with reinforcement learning, bandits, or iterative optimization systems
Exposure to agent evaluation tools (e.g., LangSmith, Braintrust, or similar concepts)
Experience with large-scale experimentation platforms
Familiarity with enterprise SaaS or CRM domains

What Success Looks Like

Agents and production-grade ML models measurably improve over time via automated feedback loops
Well-structured data and evaluation pipelines continuously feeding the agent flywheel
Clear lift in key business metrics (e.g., engagement, conversion, revenue impact)
Robust evaluation systems that enable rapid iteration and safe deployment

Apply →

Lead AI Engineer

Lead AI Engineer (Mexico City) Data Solutions Org

We are looking for a Lead AI Engineer to drive the development of next-generation AI and ML systems at Salesforce.

This role owns the design and evolution of intelligent decisioning systems and expands into building a broader agent flywheel (a system of self-improving feedback loops that continuously evaluate, optimize, and evolve agent performance).

This role sits on the applied side but requires strong data and systems engineering depth — you will build not just models and agents, but the data pipelines, evaluation loops, and lightweight system scaffolding that allow them to continuously improve in production.

You will build production-grade ML models, embed them into agent workflows, and define how agents learn from real-world outcomes. This is a hands-on, high-impact role focused on shipping systems that directly influence agent performance, efficiency, revenue, and customer experience.

What You’ll Do

1) Build the Agent Flywheel

Design and implement feedback loops that enable agents and ML models to self-improve over time

Develop systems for:

Outcome tracking (e.g., engagement, conversions, resolution quality)

Agent evaluation (LLM + deterministic + human-in-the-loop signals)

Iterative optimization (prompting, policies, model selection, fine-tuning)

Build pipelines that collect and structure agent traces (inputs, tool usage, intermediate steps, outputs) into high-quality training and evaluation datasets

Close the loop from production signals → evaluation → model/prompt improvements

2) Develop Production ML & Agent Systems

Build and deploy application-specific ML models (classification, ranking, forecasting, recommendation, etc.)

Design and implement AI agents that combine:

LLM reasoning

Tool/API usage

ML-based decisioning layers

Implement reusable agent patterns (multi-step reasoning, tool orchestration, structured outputs) within application workflows

Integrate ML and agent capabilities into decisioning systems that drive business outcomes

3) Data & Pipeline Engineering

Design and build scalable data pipelines (batch and near real-time) that power training, evaluation, and inference workflows

Develop pipelines that transform raw interaction data into features, labels, and evaluation datasets

Partner model pipelines with data pipelines to enable continuous retraining and evaluation loops

Ensure data quality, consistency, and availability across systems

Work with large-scale structured and unstructured data to support both ML and LLM systems

4) Evaluation, Experimentation & Optimization

Build offline and online evaluation frameworks for agent and ML model performance

Develop evaluation datasets, golden traces, and regression-style test sets for agent behavior

Design and run A/B experiments to measure impact on business outcomes

Define and monitor key metrics (quality, containment, revenue impact, latency, etc.)

Use production traces and evaluation signals to drive continuous optimization (prompting, model selection, feature improvements, fine-tuning)

5) Architecture & Applied Systems Design

Develop hybrid systems that blend:

Deterministic logic

Model-based scoring

LLM-driven generation

Collaborate with platform teams to leverage shared infrastructure (model serving, evaluation tooling, observability), while building application-specific layers on top

Design systems that scale with increasing agent complexity and data volume

6) Platform & API Development

Build scalable Python services and APIs powering agent workflows

Contribute to shared infrastructure for model serving, evaluation, and experimentation

Ensure reliability, observability, and performance of deployed systems

Qualifications

Core Requirements

6+ years of experience in AI/ML engineering, applied data science, or closely related roles

Strong hands-on experience in Python for production systems

Proven track record building and deploying production-grade ML models

Strong experience with data pipeline development (ETL/ELT, batch or streaming)

Experience designing and building AI agents or agent-like systems

Strong experience with API development and backend services

Experience with ML lifecycle tooling (training, evaluation, deployment, monitoring)

Data & Systems Expertise

Experience building reliable data pipelines that support ML or AI systems in production

Familiarity with:

Data processing frameworks (e.g., Spark or equivalent)

Data orchestration tools (e.g., Airflow, Dagster, etc.)

Data warehousing solutions (e.g., Snowflake, BigQuery, etc.)

Understanding of data quality, lineage, and reproducibility in ML systems

Agent & LLM Experience

Experience building or working with LLM-powered systems (prompting, orchestration, evaluation)

Familiarity with agent frameworks and tool-using agents

Experience working with agent traces, evaluation datasets, or iterative improvement loops is strongly preferred

Modeling & Systems Thinking

Strong understanding of:

Supervised learning (classification, regression, ranking)

Evaluation methodologies (offline + online)

Experimentation (A/B testing, causal inference basics)

Ability to design systems that combine:

ML models

LLMs

Business logic

Engineering & Production Skills

Experience deploying models/services in production environments

Familiarity with:

Model serving architectures

Data pipelines