Lexin Zhou

I am a research assistant at Microsoft, advised by Dr. Xing Xie and Prof. Jose Hernandez-Orallo, and an incoming PhD candidate at Princeton University, to be advised by Prof. Peter Henderson at the POLARIS Lab. I did my master’s in CS at the University of Cambridge, supervised by Prof. Andreas Vlachos. Prior to that, I did my BSc in Data Science at the Universitat Politècnica de València, where I got into research by working with Prof. Jose Hernandez-Orallo.
I am interested in research about the science of AI evaluation and social computing, regularly taking inspiration from psychometrics and cognitive science. My work has been featured in Nature, Forbes, Microsoft Research, MIT Tech Review, IEEE Spectrum, El País, New Scientists, IBM, among others.
If you wanna talk about something I do, feel free to reach out via email or on Twitter.
news
Mar 20, 2025 | 💡 Invited talk about General Scales Unlock AI Evaluation with Explanatory and Predictive Power at Princeton University. |
---|---|
Mar 09, 2025 | 📜 New preprint on introducing conceptual and technological innovations for a science of AI Evaluation: General Scales Unlock AI Evaluation with Explanatory and Predictive Power! Takeaways on X. An open platform calling for collaborations and extensions of our methodology. A accessible Microsoft Research Blog summarizing our work for the general audience. This represents the work that I personally feel the most excited about, to date. |
Oct 30, 2024 | 💡Invited talk on Larger and More Instructable Language Models Become Less Reliable at Microsoft Research! |
Sep 25, 2024 | 📜 Larger and More Instructable Language Models Become Less Reliable is finally out in Nature! Takeaways on X. This reminds me of Goodhart’s law. |
Sep 20, 2024 | 📜 An LLM Feature-based Framework for Dialogue Constructiveness Assessment is accepted by EMNLP 2024, receiving high review scores that placed it in the top 0.5% of all submissions! |
selected publications
- General Scales Unlock AI Evaluation with Explanatory and Predictive Power2025
- Larger and More Instructable Language Models Become Less ReliableNature, 2024
- An LLM Feature-based Framework for Dialogue Constructiveness AssessmentEMNLP, 2024