I am currently exploring the space of interpretability. I believe understanding the internals of these models is crucial – both for building societal trust in AI systems, and for designing models that are more controllable, ultimately reducing catastrophic risks. I’m actively looking to explore this space further. If you have any interesting projects or ideas in this area, feel free to reach out!

Moreover, I am also actively looking for a thesis project in in this area to finish up my Master’s in Computer Science at ETH Zürich.


Last updated this page on 20th May, 2026.