On a quest to understand intelligence and ensure that advanced AGI is safe and beneficial.

Satvik Golechha

Hi! I’m a Research Scientist at the UK AI Security Institute (AISI), where I work on frontier alignment research with a focus on training and interpreting model organisms of misalignment — such as of reward hacking, evaluation awareness, and sandbagging.

Previously, as an independent researcher, I worked on RL for efficient multi-turn exploration at the Center for Human-Compatible AI (CHAI) at UC Berkeley. I was also a scholar at the ML Alignment & Theory Scholars (MATS) program with Adrià Garriga-Alonso (working on frontier deception), with Nandi Schoots (on feature geometry and modularity), and I did Neel Nanda’s MATS training program on mechanistic interpretability.

Before moving full-time to work on AI safety, I worked at Microsoft Research on language models. Prior to that, I was an Associate Research Scientist at Wadhwani AI working on AI for Social Good and Healthcare.

Writing fiction and poetry along the way!

If you’d like to discuss research, collaborate, or chat about anything, drop me an email at zsatvik@gmail.com!

Research

I study intelligence (via its emergence and expression in neural networks) to ensure that advanced AGI is safe, beneficial, and useful. This involves working on alignment, security, interpretability, and reinforcement learning for frontier AI systems and agents. Here is some of my recent work:

Jordan Taylor, Sid Black, Dillon Bowen, Thomas Read, Satvik Golechha, Alex Z-M., Oliver M., Connor K., Kola A., Jacob M., Sam Marks, Chris Cundy, Joseph Bloom

Satvik Golechha, Adrià Garriga-Alonso

David Chanin, James W.S., Tomáš D., Hardik B., Satvik Golechha, Joseph Bloom

Aly Lidayan, Jakob Bjorner, Satvik Golechha, Kartik Goyal, Alane Suhr

Samuel Marks, Johannes Treutlein, . . ., Satvik Golechha, . . ., Evan Hubinger

Ishwar B. , Hasith V. , Greta K., Ronan A. , Satvik Golechha

Satvik Golechha, Lucius Bushnaq, Euan Ong, Neeraj Kayal, Nandi Schoots

Satvik Golechha, Maheep C., Joan V., Alessandro Abate, Nandi Schoots

Satvik Golechha

Satvik Golechha

Satvik Golechha, James Dao

Pragya Srivastava*, Satvik Golechha*, Amit Deshpande, Amit Sharma

Pragnya R.*, Bhuvan S.*, Satvik Golechha*, Mohit Jain, and others

Mihir Kulkarni*, Satvik Golechha*, Rishi R.*, Jithin S.*, Alpan Raval

Poetry

Writing metaphorical poetry allows a channel into emotions that could not have been expressed another way. Check out my poetry page!

Almost done with my first poetry book, Anuswaad!

Fiction

A beautiful thing happens when fiction is written. A good story reflects back to us aspects of ourselves that we’re not aware of. Really, it is the story that’s writing us.

Algebra to Zombies

A 29-week curriculum that covers foundational math required to do AI research. This accompanies a study group I used to run at Microsoft Research in India.

Research Blog

Some notes around AI research. For my research, please see my research statement and Scholar profile.

PS: For a more general (and hopefully fun) introduction to the less-taught parts of AI check out Alice!

Other Stuff

Intelligence: I write about intelligence and a number of interesting ideas in my fiction and research. I plan to bundle it into a blog series someday.

School: I’m writing a book (or a series of posts) on my version of an ideal school — I believe good schooling is highly impactful, undervalued, and achievable.

Like Winds & Dystop.ai: Slowly working on finishing these novels but aah so little time!

Infinite Jest: Reading this epic book; will take more than a couple months.

Exploring London: I’ve moved to London for the first time, HMU!

All life is bound together by mutual support and interdependence.

Acharya UmaswaTi