Lexin Zhou

profile_pic.jpg

Hi! I’m a CS PhD student at Princeton University, where I’m fortunate to be advised by Peter Henderson and closely collaborate with Tom Griffiths. I am a computer scientist by training but also regularly draw insights from cognitive science. I design systematic evaluation frameworks that allow for understanding the capabilities and generalization patterns of LLMs, as well as their associated risks (e.g. unreliability). I also search for ways to disapprove Goodhart’s Law through generalizable optimization targets and evaluations, for building more reliable and robust LLMs.

Prior to Princeton, I was at MSR for a year, working with Dr. Xing Xie. I did my MPhil in CS at Cambridge University, supervised by Andreas Vlachos, and my BSc in DS at TU Valencia, under Jose Hernandez-Orallo.

I’ve spent time in research/consultancy roles at MSR, OpenAI, Meta AI, and European Commission. My work has been featured in Nature, Financial Times, Microsoft Research, MIT Tech Review, Forbes, IEEE Spectrum, El País, New Scientists, QbitAI, IBM, among others.

If you wanna talk about something I do, feel free to reach out via email or on Twitter!

news

Sep 11, 2025 💡 Invited talk about General Scales Unlock AI Evaluation with Explanatory and Predictive Power at Future of Life Institute.
Sep 02, 2025 🔥 Starting my PhD studies at the CS department of Princeton University!
Mar 09, 2025 📜 New preprint on introducing conceptual and technological innovations for a science of AI Evaluation: General Scales Unlock AI Evaluation with Explanatory and Predictive Power! Takeaways on X. An open platform calling for collaborations and extensions of our methodology. A accessible Microsoft Research Blog summarizing our work for the general audience. This represents the work that I personally feel the most excited about, to date.
Sep 09, 2022 👨‍💻 Participated in the Red Team of GPT-4 at OpenAI, focusing on capability assessment, reliability evaluation, and adversarial testing.

selected publications

  1. General Scales Unlock AI Evaluation with Explanatory and Predictive Power
    Lexin Zhou, Lorenzo Pacchiardi, Fernando Martı́nez-Plumed, Katherine M. Collins, Yael Moros-Daval, Seraphina Zhang, and 20 more authors
    Nature (In Press), 2025
  2. Larger and More Instructable Language Models Become Less Reliable
    Lexin Zhou, Wout Schellaert, Fernando Martı́nez-Plumed, Yael Moros-Daval, Cèsar Ferri, and José Hernández-Orallo
    Nature, 2024