Sr. Research Manager, Evaluation Science
AI systems are only as trustworthy as the methods used to evaluate them. At Apple, where AI powers experiences for billions of people, getting evaluation right is not a support function. It is a foundational science. As these systems grow in complexity , the quality of our products is increasingly constrained by the quality of our evaluation methods. Our team is building the scientific foundation and self-service tools for how AI evaluation is done at scale, spanning LLMs, agentic systems, and human-AI interaction. We don’t just publish methods; we productionize them. We are looking for a Sr. Research Manager to lead an ML research team that advances the state-of-the-art in evaluation methods that can be shipped as production tools for Apple developers and published in top venues.