DataJobs
RoleSuite
CompaniesRemoteAboutMethodologyContactPrivacy
Updated 2026-06-10 03:00 UTC·© 2025–2026 RoleSuite
← Back to listings

Research Scientist / Engineer - Video Generation Modeling

Rhoda AI · Palo Alto

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for Research Scientists and Research Engineers to push the frontier of large-scale pre-training for our video action model. Our approach formulates robot control as video prediction — we pre-train causal video generation models on web-scale video data, then adapt them to predict robot actions from real-world demonstrations. You'll work on the core architectures, training objectives, and scaling strategies that determine how well our models learn from internet-scale video. We hire across levels — from senior to staff — and welcome both research-track and engineering-track candidates.

What You'll Do

  • Design and train large-scale causal video generation models on web-scale video data

  • Develop and validate training objectives, model architectures, and data mixtures for video prediction at scale

  • Research scaling laws and data efficiency for web-scale video pretraining

  • Investigate what properties of web video transfer most effectively to robotic control and action prediction

  • Build systematic evaluations to measure video generation quality, long-horizon prediction fidelity, and downstream robot task performance

  • Run rigorous ablations and benchmarking to understand what drives model quality at scale

  • Collaborate closely with data & evaluation, post-training, and training systems teams to translate research ideas into working systems

  • Publish and present work at top-tier ML and robotics venues (especially valued for RS track)

What We're Looking For

  • Strong background in large-scale generative modeling — either video generation (autoregressive video models, diffusion transformers, causal video architectures) or language model pretraining (LLMs, autoregressive transformers at scale)

  • Hands-on experience training large generative models from scratch at scale

  • Deep understanding of autoregressive modeling, causal architectures, and scaling behavior

  • Fluency with modern ML frameworks (PyTorch required; JAX a plus)

  • Ability to design experiments, interpret results, and iterate quickly

  • Strong research taste: ability to identify high-leverage questions and cut through noise

  • Comfort operating in a fast-moving, ambiguous startup environment

  • Staff-level candidates are expected to define technical direction and drive research strategy independently; senior/MTS candidates execute complex projects with strong fundamentals and growing scope

Nice to Have (But Not Required)

  • PhD in ML, CS, Robotics, or a related field — or equivalent research/industry experience

  • Strong publication record at NeurIPS, ICML, ICLR, CVPR, CoRL, etc. (especially valued for RS track)

  • Prior work specifically on video generation models (autoregressive video, diffusion transformers, world models, or causal video architectures)

  • Experience with large-scale autoregressive language model pretraining and scaling

  • Familiarity with web-scale video datasets and video data curation pipelines

  • Prior work connecting video generation to control, action prediction, or robotic learning

  • Familiarity with distributed training and multi-node infrastructure

Why This Role

  • Work on a fundamentally different approach to robot learning — web-scale video pretraining rather than robot-data-only VLA models

  • Your models give our robots the ability to understand and predict the visual world from internet-scale supervision

  • Direct collaboration with data, post-training, and deployment teams with no silos

  • High ownership and fast iteration in a small, elite team

Apply →

Other roles at Rhoda AI

  • HR ManagerMountain View
  • Research Scientist / Engineer - Pre-training Data & EvaluationPalo Alto
  • Staff Electrical Engineer (Sensors & Imaging Systems)Mountain View
  • Robot Software EngineerMountain View
  • Senior Actuator Modeling EngineerPalo Alto
  • Senior Electrical Engineer- Power ManagementPalo Alto
  • Robotics Software Test Engineer Palo Alto
  • Robot System QAMountain View
  • Research Scientist / Engineer - Efficient ModelingMountain View
  • Applied Research Scientist / Engineer - DeploymentPalo Alto

More Data & ML roles

  • Senior Manager, Vx Data Governance and EnablementPfizer · India - Mumbai
  • Data Scientist – Seed Robotics & AIEnza Zaden · Enkhuizen, The Netherlands
  • Finance Data EngineerApple · Cupertino
  • Director, Data Science, Personalization, Gemini App, DeepMindGoogle · Mountain View, CA, USA
  • Data EngineerApple · Cupertino
  • Business and Marketing Data Scientist, Applied Machine LearningGoogle · New York, NY, USA
  • Finance Data EngineerApple · Cupertino
  • Principal Applied Scientist UiPath · Bellevue
  • Generative AI Applied Scientist, SIML - ISEApple · Cupertino
  • Systems Engineer - Evaluation EngineeringApple · Cupertino