smn¹

Simon Schug
Postdoctoral Researcher
Princeton University

Email: sschug [ät] princeton.edu
Github ∙ Google Scholar ∙ Bluesky

Research

In my research, I seek to understand the principles that allow large-scale neural networks (including the human brain) to rapidly adapt and systematically generalize.

In some of my recent work, we identify conditions under which modular neural networks will compositionally generalize, find that simply scaling neural networks can lead to compositional generalization, and discover how transformers' multi-head attention can capture compositional structure in abstract reasoning tasks.

About

I am a postdoctoral researcher in the Department of Computer Science at Princeton University where I work with Brenden Lake at the intersection of machine learning and cognitive science.

I completed my Ph.D. in Computer Science at ETH Zurich in February 2025 where I was supervised by João Sacramento and Angelika Steger and worked on meta-learning and compositional generalization in neural networks. During the final year of my Ph.D. I spent six months in the Foundational Research Team at Google DeepMind researching efficient inference in large-scale mixture of experts architectures.

Prior to my doctoral studies, I completed my Master's thesis at the University of Cambridge with Maté Lengyel and received an M.Sc. in Neural Systems & Computation from ETH Zurich and UZH in 2020, after having simultaneously pursued and obtained B.Sc. degrees in both Electrical Engineering and Psychology from RWTH Aachen in 2017.

Selected Publications

* – equal contributions

Scaling can lead to compositional generalization
F Redhardt*, Y Akram*, S Schug
NeurIPS 2025 (spotlight)
paper ∙ code ∙ arxiv
Meta-learning & compositional generalization in neural networks
S Schug
Doctoral Thesis ETH Zurich
thesis
Attention as a Hypernetwork
S Schug, S Kobayashi, Y Akram, J Sacramento, R Pascanu
ICLR 2025 (oral)
paper ∙ code ∙ arxiv ∙ tweet
When can transformers compositionally generalize in-context?
S Kobayashi*, S Schug*, Y Akram*, F Redhardt, J von Oswald, R Pascanu, G Lajoie, J Sacramento
NGSM workshop at ICML 2024
arxiv ∙ workshop
Discovering modular solutions that generalize compositionally
S Schug*, S Kobayashi*, Y Akram, M Wołczyk, A Proca, J von Oswald, R Pascanu, J Sacramento, A Steger
ICLR 2024
paper ∙ arxiv ∙ code
Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
A Meulemans*, S Schug*, S Kobayashi*, N Daw, G Wayne
NeurIPS 2023 (spotlight)
paper ∙ arxiv ∙ code ∙ tweet
Online learning of long-range dependencies
N Zucchet*, R Meier*, S Schug*, A Mujika, J Sacramento
NeurIPS 2023
paper ∙ arxiv ∙ code
A contrastive rule for meta-learning
N Zucchet*, S Schug*, J von Oswald*, D Zhao, J Sacramento
NeurIPS 2022
paper ∙ arxiv ∙ code ∙ tweet
Random initialisations performing above chance and how to find them
F Benzing, S Schug, R Meier, J von Oswald, Y Akram, N Zucchet, L Aitchison, A Steger
OPT2022 workshop at NeurIPS 2022
paper ∙ arxiv ∙ code ∙ tweet
Presynaptic stochasticity improves energy efficiency and helps alleviate the stability-plasticity dilemma
S Schug*, F Benzing*, A Steger
eLife 10: e69884, 2021
paper ∙ biorxiv ∙ code ∙ tweet

Resources

2025

code

autofsdp

A small utility to add Fully-Sharded Data Parallelism (FSDP) with minimal code changes using jax.

2024

code

minimal-hypernetwork

A minimal but highly flexible hypernetwork implementation in jax using flax.

2023

code

metax

An extendible meta-learning library in jax for research. It bundles various meta-learning algorithms and architectures that can be flexibly combined.

code

shrink-perturb

An optax implementation of the shrink and perturb algorithm proposed by Ash & Adams to deal with pathologies of warm starting.

2022

code

2020

slides

2019

slides

git for open science

A four-hour workshop on git taught at the Goethe-University Frankfurt for the Frankfurt Open Science Initiative.

¹ Survival of motor neuron 1, also known as component of gems 1, is a gene that encodes the SMN protein in humans.

smn¹

Research

About

Selected Publications

Scaling can lead to compositional generalization

Meta-learning & compositional generalization in neural networks

Attention as a Hypernetwork

When can transformers compositionally generalize in-context?

Discovering modular solutions that generalize compositionally

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

Online learning of long-range dependencies

A contrastive rule for meta-learning

Random initialisations performing above chance and how to find them

Presynaptic stochasticity improves energy efficiency and helps alleviate the stability-plasticity dilemma

Resources

2025

autofsdp

2024

minimal-hypernetwork

2023

metax

shrink-perturb

2022

jax-hypernetwork

Lecture on bilevel optimization

flaxify

2020

Models of decision making

Cryptography workshop

Equilibrium Propagation pytorch

2019

git for open science