The First Representational Alignment Workshop

Workshop Date: TBD

In person location: TBD

In person poster session: TBD

Fill out this Google Form to help us gauge interest!

As AI systems become increasingly embedded in our lives it has become of paramount importance to understand whether these systems are aligned with humans. In this workshop, we focus on the problem of representational alignment between humans and machines: do today’s large-scale deep learning models rely on the same internal representations and strategies to solve tasks as humans do? Answering this question will provide the field of artificial intelligence with guidance on how to build safer, more interpretable, and reliable models of behavior, and biological sciences with new tools for generating hypotheses on the underpinnings of perception and cognition.

The study of the kinds of representations that humans and machines construct about the world has a long history spanning cognitive science, neuroscience, and machine learning. The alignment of these representations has gone by many names – including latent space alignment, concept(ual) alignment, system alignment, representational similarity analysis, model alignment, and representational alignment – and has implicitly or explicitly been an objective in many subareas of machine learning including knowledge distillation, disentanglement, and concept-based models. The exploration of human-model alignment has classically focused on value alignment: the goal of building models that broadly benefit humanity. Value alignment is notoriously difficult to define and measure and, as such, researchers instead often evaluate the alignment of model and human behavioral outputs or task performance. However, monitoring output alignment is insufficient to tell if a model is actually aligned with humans, or just acting that way. For instance, it is possible for a model to generate the same behavior as humans by relying on different visual features. Representation alignment is intrinsically linked to value alignment and to behavioral alignment – a deeper understanding of representational alignment will help us determine whether representational guarantees will lead to general value alignment, and conversely, under what circumstances behavioral alignment is sufficient for value alignment. Knowing that ML systems share our representations of the world may increase our trust in them and enable us to more efficiently communicate with them (e.g., knowing that the model will understand decompositions of the world in the same way as we do). To the extent that humans have useful representations of the world, representational alignment is also an effective source of inductive bias that may improve generalization and make it possible to learn from limited human supervision. Further, studying representational alignment can even reveal domains where models are able to learn better domain-specific representations than humans, which could be leveraged to complement and empower humans when designing hybrid systems.

At this workshop, we will welcome perspectives from cognitive science, neuroscience, machine learning, and related fields with the goal of posing and exploring questions about representational alignment including:

  • What is representational alignment, and how should it be measured?
  • How well do measures of representational alignment generalize to new data?
  • What are the consequences (positive or negative) of representational alignment?
  • How does representational alignment connect to value alignment and output alignment?
  • How can we increase (or decrease) representational alignment?
While the focus of the workshop will generally be on the representational alignment of models with humans, we also welcome submissions regarding representational alignment in other settings (e.g. alignment of models with other models).

Speakers and Panelists

Mariya Toneva

Max Planck Institute for Software Systems

Erin Grant

University College London

Bradley Love

University College London

Been Kim

Google Brain

Tom Griffiths

Princeton University

Simon Kornblith

Google Brain

Julie Shah

Massachusetts Institute of Technology

Michael Mozer

Google Brain


Ilia Sucholutsky

Princeton University

Drew Linsley

Brown University

Jascha Achterberg

University of Cambridge

Sherol Chen

Google Research

Andi Peng

Massachusetts Institute of Technology

Richard Zhang

Google Brain

Robert Geirhos

Google Brain

Katie Collins

University of Cambridge

Andreea Bobu

University of California Berkeley

Sunayana Rane

Princeton University

Reach out to for any questions.