Staff Software Engineer, Gemini App, Horizontal Quality, DeepMind
Our mission is to elevate the Gemini experience by perfectly aligning foundational model behaviors with high-quality data. We drive conversational excellence through thoughtful persona shaping, robust safety enforcement, and clear information architecture. By combining these efforts with rich online and offline signals, we deliver a product that is highly performant and effortlessly intuitive.
Artificial intelligence will be one of humanity’s most transformative inventions. At Google DeepMind, we are a pioneering AI lab with exceptional interdisciplinary teams focused on advancing AI development to solve complex global challenges and accelerate high-quality product innovation for billions of users. We use our technologies for widespread public benefit and scientific discovery, ensuring safety and ethics are always our highest priority.
Minimum qualifications:
- Bachelor’s degree or equivalent practical experience.
- 8 years of experience in software development.
- 5 years of experience leading technical strategy and architecting large-scale ML infrastructure (e.g., designing serving layers, model evaluation frameworks, or data processing pipelines).
- 5 years of experience testing, and launching software products.
- 3 years of experience with Generative AI, Large Language Models (LLMs), Machine Learning, and related frameworks.
Preferred qualifications:
- Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
- 3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
- Experience in building and scaling evaluation pipelines (e.g., RLHF, auto-evals, or side-by-side human evaluations) to measure helpfulness and accuracy.
- Proficiency in advanced prompting techniques and understanding how model fine-tuning, RL, or RAG impacts final response quality.
- Ability to use SQL, Python, or internal data tools to analyze user behavioral data and "pain points" to identify where the model is failing.