Hi~ I am Zhenhong Zhou, a first-year PhD student at Nanyang Technological University, Prof. Yang Liu. My research interest includes AI Safety and LLM Safety, Jailbreak, Interpretability, and Privacy specifically.

I obtained my bachelorโ€™s degree in 2018 and my masterโ€™s degree in 2022 from Beijing University of Posts and Telecommunications, advised by Prof. Sen Su.

If you are interested in my research, please contact me.๐Ÿ˜Š๐Ÿ˜Š๐Ÿ˜Š

๐Ÿ”ฅ News

  • 2025.09: ย ๐ŸŽ‰๐ŸŽ‰ One paper is accepted by NeurIPS 2025
  • 2025.09: ย ๐ŸŽ‰๐ŸŽ‰ Two paper are accepted by EMNLP 2025
  • 2025.05: ย ๐ŸŽ‰๐ŸŽ‰ One paper is accepted by ACL 2025
  • 2025.04: ย ๐ŸŽ‰๐ŸŽ‰ One paper is accepted by ICML 2025
  • 2025.01: ย ๐ŸŽ‰๐ŸŽ‰ One paper is accepted by ICLR 2025 (Oral)
  • 2024.09: ย ๐ŸŽ‰๐ŸŽ‰ Three papers are accepted by EMNLP 2024
  • 2023.12: ย ๐ŸŽ‰๐ŸŽ‰ One paper is accepted by AAAI 2024

๐Ÿ“ Selected Publications

ICLR 2025 (Oral Top 1.8%)
sym

On the Role of Attention Heads in Large Language Model Safety

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Kun Wang, Yang Liu, Junfeng Fang, Yongbin Li

๐Ÿ’ป [Code]: Link

๐Ÿ“„ [Paper]: Link

EMNLP 2024
sym

Alignment-Enhanced Decoding: Defending via Token-Level Adaptive Refining of Probability Distributions

Quan Liu, Zhenhong Zhou(Co-First), Longzhu He, Yi Liu, Wei Zhang, Sen Su

๐Ÿ’ป [Code]: Link

๐Ÿ“„ [Paper]: Link

EMNLP 2024 Findings
sym

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li

๐Ÿ’ป [Code]: Link

๐Ÿ“„ [Paper]: Link

AAAI 2024
sym

Quantifying and Analyzing Entity-Level Memorization in Large Language Models

Zhenhong Zhou, Jiuyang Xiang, Chaomeng Chen, Sen Su

๐Ÿ“„ [Paper]: Link

arXiv
sym

Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

Zhenhong Zhou, Jiuyang Xiang, Haopeng Chen, Quan Liu, Zherui Li, Sen Su

๐Ÿ“„ [Paper]: Link

๐ŸŽˆ Project

  • Awesome-LLM-Safetyย ย ย ย GitHub stars
  • LifeReloadedย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย GitHub stars
  • Bolarisย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย GitHub stars

๐Ÿ“– Educations

  • 2025.08 - now, PhD Student @ College of Computing and Data Science, Nanyang Technological University
  • 2022.09 - 2025.06, Master of Computer Science and Technology, Beijing University of Posts and Telecommunications.
  • 2018.09 - 2022.06, Bachelor of Data Science and Big Data Technology, Beijing University of Posts and Telecommunications.

๐Ÿ’ป Internships

  • 2024.12 - 2025.06, A*Star, Singapore.
  • 2024.02 - 2024.12, Tongyi, Alibaba, Beijing, China.
  • 2023.10 - 2024.02, Baichuan Inc., Beijing, China.

๐Ÿ–Š๏ธ Blog

๐ŸŽ– Honors and Awards

  • 2024.10 National master scholarship, Beijing University of Posts and Telecommunications.
  • 2023.10 First-class postgraduate academic scholarship, Beijing University of Posts and Telecommunications.
  • 2022.10 Second-class postgraduate academic scholarship, Beijing University of Posts and Telecommunications.