smn¹

Simon Schug
Postdoctoral Researcher
Princeton University

Email: sschug [ät] princeton.edu
GithubGoogle ScholarBluesky

Simon Schug

Research

In my research, I seek to understand the principles that allow large-scale neural networks (including the human brain) to rapidly adapt and systematically generalize.

In some of my recent work, we identify conditions under which modular neural networks will compositionally generalize, find that simply scaling neural networks can lead to compositional generalization, and discover how transformers' multi-head attention can capture compositional structure in abstract reasoning tasks.

About

I am a postdoctoral researcher in the Department of Computer Science at Princeton University where I work with Brenden Lake at the intersection of machine learning and cognitive science.

I completed my Ph.D. in Computer Science at ETH Zurich in February 2025 where I was supervised by João Sacramento and Angelika Steger and worked on meta-learning and compositional generalization in neural networks. During the final year of my Ph.D. I spent six months in the Foundational Research Team at Google DeepMind researching efficient inference in large-scale mixture of experts architectures.

Prior to my doctoral studies, I completed my Master's thesis at the University of Cambridge with Maté Lengyel and received an M.Sc. in Neural Systems & Computation from ETH Zurich and UZH in 2020, after having simultaneously pursued and obtained B.Sc. degrees in both Electrical Engineering and Psychology from RWTH Aachen in 2017.

Publications

*equal contributions

  1. Scale leads to compositional generalization

    F Redhardt, Y Akram, S Schug
    arXiv 2025
    arxiv
  2. Meta-learning & compositional generalization in neural networks

    S Schug
    Doctoral Thesis ETH Zurich
    thesis
  3. Attention as a Hypernetwork

    S Schug, S Kobayashi, Y Akram, J Sacramento, R Pascanu
    ICLR 2025 (oral)
    papercodearxivtweet
  4. When can transformers compositionally generalize in-context?

    S Kobayashi*, S Schug*, Y Akram*, F Redhardt, J von Oswald, R Pascanu, G Lajoie, J Sacramento
    NGSM workshop at ICML 2024
    arxivworkshop
  5. Discovering modular solutions that generalize compositionally

    S Schug*, S Kobayashi*, Y Akram, M Wołczyk, A Proca, J von Oswald, R Pascanu, J Sacramento, A Steger
    ICLR 2024
    paperarxivcode
  6. Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

    A Meulemans*, S Schug*, S Kobayashi*, N Daw, G Wayne
    NeurIPS 2023 (spotlight)
    paperarxivcodetweet
  7. Online learning of long-range dependencies

    N Zucchet*, R Meier*, S Schug*, A Mujika, J Sacramento
    NeurIPS 2023
    paperarxivcode
  8. A complementary systems theory of meta-learning

    S Schug, N Zucchet, J von Oswald, J Sacramento
    Cosyne 2023
    poster
  9. A contrastive rule for meta-learning

    N Zucchet*, S Schug*, J von Oswald*, D Zhao, J Sacramento
    NeurIPS 2022
    paperarxivcodetweet
  10. Random initialisations performing above chance and how to find them

    F Benzing, S Schug, R Meier, J von Oswald, Y Akram, N Zucchet, L Aitchison, A Steger
    OPT2022 workshop at NeurIPS 2022
    paperarxivcodetweet
  11. Presynaptic stochasticity improves energy efficiency and helps alleviate the stability-plasticity dilemma

    S Schug*, F Benzing*, A Steger
    eLife 10: e69884, 2021
    paperbiorxivcodetweet
  12. Learning where to learn: Gradient sparsity in meta and continual learning

    J von Oswald*, D Zhao*, S Kobayashi, S Schug, M Caccia, N Zucchet, J Sacramento
    NeurIPS 2021
    paperarxivcode
  13. Task-Agnostic Continual Learning via Stochastic Synapses

    S Schug, F Benzing, A Steger
    Workshop on Continual Learning at ICML 2020
    paperworkshop
  14. Evolving instinctive behaviour in resource-constrained autonomous agents using grammatical evolution

    A Hallawa, S Schug, G Iacca, G Ascheid
    EvoStar 2020
    paper

Resources

2025

code

autofsdp

A small utility to add Fully-Sharded Data Parallelism (FSDP) with minimal code changes using jax.

2024

code

minimal-hypernetwork

A minimal but highly flexible hypernetwork implementation in jax using flax.

2023

code

metax

An extendible meta-learning library in jax for research. It bundles various meta-learning algorithms and architectures that can be flexibly combined.

code

shrink-perturb

An optax implementation of the shrink and perturb algorithm proposed by Ash & Adams to deal with pathologies of warm starting.

2022

code

jax-hypernetwork

A simple & flexible hypernetwork library in jax using haiku.

video

Lecture on bilevel optimization

Lecture on Bilevel optimization problems in neuroscience and machine learning at the MLSS^N 2022 summer school in Krakow.

code

flaxify

Tiny utility to simplify the instantiation of haiku models.

2020

slides

Models of decision making

A one-hour workshop on models of decision making in neuroscience taught for the Swiss Study Foundation.

slides

Cryptography workshop

A two-hour introductory workshop on cryptography taught online for hebbian, adaptable for students between 10-18 years.

code

Equilibrium Propagation pytorch

Pytorch implementation of the Equilibrium Propagation algorithm.

2019

slides

git for open science

A four-hour workshop on git taught at the Goethe-University Frankfurt for the Frankfurt Open Science Initiative.


¹ Survival of motor neuron 1, also known as component of gems 1, is a gene that encodes the SMN protein in humans.