Hi~ I am Zhenhong Zhou, a third-year master student at Beijing University of Posts and Telecommunications, advised by Prof. Sen Su. My research interest includes AI Safety and LLM Safety, Jailbreak, Interpretability, and Privacy specifically.

I will get my masterโ€™s degree in June 2025. Then, I will start my PhD at Nanyang Technological University under the supervision of Prof. Yang Liu.

If you are interested in my research, please contact me.๐Ÿ˜Š๐Ÿ˜Š๐Ÿ˜Š

๐Ÿ”ฅ News

  • 2025.01: ย ๐ŸŽ‰๐ŸŽ‰ One paper is accepted by ICLR 2025 (Oral)
  • 2024.09: ย ๐ŸŽ‰๐ŸŽ‰ Two papers are accepted by EMNLP 2024
  • 2023.12: ย ๐ŸŽ‰๐ŸŽ‰ One paper is accepted by AAAI 2024

๐Ÿ“ Selected Publications

ICLR 2025 (Oral Top 1.8%)
sym

On the Role of Attention Heads in Large Language Model Safety

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Kun Wang, Yang Liu, Junfeng Fang, Yongbin Li

๐Ÿ’ป [Code]: Link

๐Ÿ“„ [Paper]: Link

EMNLP 2024
sym

Alignment-Enhanced Decoding: Defending via Token-Level Adaptive Refining of Probability Distributions

Quan Liu, Zhenhong Zhou(Co-First), Longzhu He, Yi Liu, Wei Zhang, Sen Su

๐Ÿ’ป [Code]: Link

๐Ÿ“„ [Paper]: Link

EMNLP 2024 Findings
sym

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li

๐Ÿ’ป [Code]: Link

๐Ÿ“„ [Paper]: Link

AAAI 2024
sym

Quantifying and Analyzing Entity-Level Memorization in Large Language Models

Zhenhong Zhou, Jiuyang Xiang, Chaomeng Chen, Sen Su

๐Ÿ“„ [Paper]: Link

arXiv
sym

Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

Zhenhong Zhou, Jiuyang Xiang, Haopeng Chen, Quan Liu, Zherui Li, Sen Su

๐Ÿ“„ [Paper]: Link

๐ŸŽˆ Project

  • Awesome-LLM-Safetyย ย ย ย GitHub stars
  • LifeReloadedย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย GitHub stars
  • Bolarisย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย GitHub stars

๐Ÿ“– Educations

  • 2022.09 - now, Master of Computer Science and Technology, Beijing University of Posts and Telecommunications.
  • 2018.09 - 2022.06, Bachelor of Data Science and Big Data Technology, Beijing University of Posts and Telecommunications.

๐Ÿ’ป Internships

๐Ÿ–Š๏ธ Blog

๐ŸŽ– Honors and Awards

  • 2023.10 First-class postgraduate academic scholarship, Beijing University of Posts and Telecommunications.
  • 2022.10 Second-class postgraduate academic scholarship, Beijing University of Posts and Telecommunications.