AI Resident - Learning From Videos (LFV)
Company: Toyota Research Institute
Location: Los Altos
Posted on: April 1, 2026
|
|
|
Job Description:
At Toyota Research Institute (TRI), we’re on a mission to
improve the quality of human life. We’re developing new tools and
capabilities to amplify the human experience. To lead this
transformative shift in mobility, we’ve built a world-class team
advancing the state of the art in AI, robotics, driving, and
material sciences. The Team The Learning From Videos (LFV) team in
the Robotics division focuses on the development of foundation
models capable of leveraging large-scale multi-modal (RGB, depth,
flow, semantics, bounding boxes, tactile, audio, etc) data from
multiple domains (driving, robotics, indoors, outdoors, etc) to
improve downstream task performance. Our approach emphasizes
training scalability: by learning from multiple modalities, models
can develop useful data-driven priors about 3D geometry, physics,
and dynamics for world understanding. Our research interests
include, but are not limited to: Video Generation World Models 4D
Reconstruction Multi-Modal Models Multi-View Geometry Data
Augmentation Video-Language-Action Models We focus primarily on
embodied applications and aim to tackle some of the hardest
scientific challenges in spatio-temporal reasoning, enabling
autonomous agents to operate in real-world, unstructured
environments. The AI Resident This year-long AI Residency is a
research-focused position designed for early-career researchers and
engineers who are excited to work on ambitious problems in embodied
AI. The resident will be deeply integrated into the LFV team,
contributing to both ongoing and new research efforts in areas
including: 4D World Models Physical and Embodied Intelligence
Multi-Modal Learning As an AI Resident, you will collaborate
closely with researchers and engineers at TRI on high-risk, pushing
forward our understanding of spatio-temporal reasoning and
zero-shot generalization. This is a research-focused position,
targeting the development of methods and techniques that can solve
real-world problems. We welcome you to join a positive, friendly,
and enthusiastic team of researchers, where you will contribute to
helping people gain and maintain independence, access, and
mobility. We work closely with other Toyota affiliates, and
actively collaborate towards research publications and the
productization of our developed technologies. Responsibilities
Develop, integrate, and deploy algorithms for Multi-Modal and 4D
reasoning targeting physical applications. Handle the ingestion of
large-scale datasets for training, including streaming, online, and
continual learning. Contribute innovative solutions at the
intersection of machine learning, computer vision, and robotics to
improve real-world task performance. Work closely with robotics and
machine learning researchers and engineers to understand
theoretical and practical needs. Follow best practices producing
maintainable code, both for internal use as well as for
open-sourcing to the scientific community. Contribute to research
publications and technical reports. Qualifications Bachelor's or
Master’s degree in Computer Science, Electrical Engineering,
Robotics, or a related technical field. Exceptional candidates with
equivalent research experience (e.g., strong publication record,
open-source contributions, or industry research experience) are
encouraged to apply. Strong background in computer vision and its
applications to robotics and embodied systems. Demonstrated
research experience through publications, technical projects, or
open-source contributions. Strong communication skills and a
collaborative mindset, with the ability to learn quickly and
contribute to team research efforts. Passionate about assisting and
amplifying older adults and those in need through dexterous
manipulation, human-robot collaboration, and physical assistance
innovation. Bonus Qualifications Spatio-temporal (4D) computer
vision, including multi-view geometry, 3D/4D reconstruction, video
generation, self-supervised learning, occlusion reasoning, etc.
Large-scale training of multi-modal deep learning methods, both in
terms of dataset sizes and model complexity, context length
extension, and efficient attention, distributed computing, etc.
Application of machine learning and computer vision to embodied
applications. The pay range for this position at commencement of
employment is expected to be between $45 and $60/hour for
California-based roles. Base pay offered will depend on multiple
individualized factors, including, but not limited to, a
candidate's experience, skills, job-related knowledge, and market
location. TRI offers a generous benefits package including medical,
dental, and vision insurance, and paid time off benefits (including
holiday pay and sick time). Additional details regarding these
benefit plans will be provided if an employee receives an offer of
employment. Please reference this Candidate Privacy Notice to
inform you of the categories of personal information that we
collect from individuals who inquire about and/or apply to work for
Toyota Research Institute, Inc. or its subsidiaries, including
Toyota A.I. Ventures GP, L.P., and the purposes for which we use
such personal information. TRI is fueled by a diverse and inclusive
community of people with unique backgrounds, education and life
experiences. We are dedicated to fostering an innovative and
collaborative environment by living the values that are an
essential part of our culture. We believe diversity makes us
stronger and are proud to provide Equal Employment Opportunity for
all, without regard to an applicant’s race, color, creed, gender,
gender identity or expression, sexual orientation, national origin,
age, physical or mental disability, medical condition, religion,
marital status, genetic information, veteran status, or any other
status protected under federal, state or local laws. It is unlawful
in Massachusetts to require or administer a lie detector test as a
condition of employment or continued employment. An employer who
violates this law shall be subject to criminal penalties and civil
liability. Pursuant to the San Francisco Fair Chance Ordinance, we
will consider qualified applicants with arrest and conviction
records for employment. We may use artificial intelligence (AI)
tools to support parts of the hiring process, such as reviewing
applications, analyzing resumes, or assessing responses. These
tools assist our recruitment team but do not replace human
judgment. Final hiring decisions are ultimately made by humans. If
you would like more information about how your data is processed,
please contact us.
Keywords: Toyota Research Institute, West Sacramento , AI Resident - Learning From Videos (LFV), Engineering , Los Altos, California