Lexin Zhou

prof_pic.jpg

I am a research assistant at Microsoft, advised by Dr. Xing Xie and Prof. Jose Hernandez-Orallo, and an incoming PhD candidate at Princeton University, to be advised by Prof. Peter Henderson at the POLARIS Lab. I did my master’s in CS at the University of Cambridge, supervised by Prof. Andreas Vlachos. Prior to that, I did my BSc in Data Science at the Universitat Politècnica de València, where I got into research by working with Prof. Jose Hernandez-Orallo.

I am interested in research about the science of AI evaluation and social computing, regularly taking inspiration from psychometrics and cognitive science. My work has been featured in Nature, Forbes, Microsoft Research, MIT Tech Review, IEEE Spectrum, El País, New Scientists, IBM, among others.

If you wanna talk about something I do, feel free to reach out via email or on Twitter.

news

Mar 20, 2025 💡 Invited talk about General Scales Unlock AI Evaluation with Explanatory and Predictive Power at Princeton University.
Mar 09, 2025 📜 New preprint on introducing conceptual and technological innovations for a science of AI Evaluation: General Scales Unlock AI Evaluation with Explanatory and Predictive Power! Takeaways on X. An open platform calling for collaborations and extensions of our methodology. A accessible Microsoft Research Blog summarizing our work for the general audience. This represents the work that I personally feel the most excited about, to date.
Oct 30, 2024 💡Invited talk on Larger and More Instructable Language Models Become Less Reliable at Microsoft Research!
Sep 25, 2024 📜 Larger and More Instructable Language Models Become Less Reliable is finally out in Nature! Takeaways on X. This reminds me of Goodhart’s law.
Sep 20, 2024 📜 An LLM Feature-based Framework for Dialogue Constructiveness Assessment is accepted by EMNLP 2024, receiving high review scores that placed it in the top 0.5% of all submissions!

selected publications

  1. General Scales Unlock AI Evaluation with Explanatory and Predictive Power
    Lexin Zhou, Lorenzo Pacchiardi, Fernando Martı́nez-Plumed, Katherine M. Collins, Yael Moros-Daval, Seraphina Zhang, and 20 more authors
    2025
  2. Larger and More Instructable Language Models Become Less Reliable
    Lexin Zhou, Wout Schellaert, Fernando Martı́nez-Plumed, Yael Moros-Daval, Cèsar Ferri, and José Hernández-Orallo
    Nature, 2024
  3. An LLM Feature-based Framework for Dialogue Constructiveness Assessment
    Lexin Zhou, Youmna Farag, and Andreas Vlachos
    EMNLP, 2024