Hi~ I am Zhenhong Zhou, a first-year PhD student at Nanyang Technological University, Prof. Yang Liu. My research interest includes AI Safety and LLM Safety, Jailbreak, Interpretability, and Privacy specifically.

I obtained my bachelor’s degree in 2018 and my master’s degree in 2022 from Beijing University of Posts and Telecommunications, advised by Prof. Sen Su.

If you are interested in my research, please contact me.😊😊😊

🔥 News

2025.09: 🎉🎉 One paper is accepted by NeurIPS 2025
2025.09: 🎉🎉 Two paper are accepted by EMNLP 2025
2025.05: 🎉🎉 One paper is accepted by ACL 2025
2025.04: 🎉🎉 One paper is accepted by ICML 2025
2025.01: 🎉🎉 One paper is accepted by ICLR 2025 (Oral)
2024.09: 🎉🎉 Three papers are accepted by EMNLP 2024
2023.12: 🎉🎉 One paper is accepted by AAAI 2024

📝 Selected Publications

ICLR 2025 (Oral Top 1.8%)

On the Role of Attention Heads in Large Language Model Safety

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Kun Wang, Yang Liu, Junfeng Fang, Yongbin Li

💻 [Code]: Link

📄 [Paper]: Link

EMNLP 2024

Alignment-Enhanced Decoding: Defending via Token-Level Adaptive Refining of Probability Distributions

Quan Liu, Zhenhong Zhou(Co-First), Longzhu He, Yi Liu, Wei Zhang, Sen Su

💻 [Code]: Link

📄 [Paper]: Link

EMNLP 2024 Findings

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li

💻 [Code]: Link

📄 [Paper]: Link

AAAI 2024

Quantifying and Analyzing Entity-Level Memorization in Large Language Models

Zhenhong Zhou, Jiuyang Xiang, Chaomeng Chen, Sen Su

📄 [Paper]: Link

arXiv

Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

Zhenhong Zhou, Jiuyang Xiang, Haopeng Chen, Quan Liu, Zherui Li, Sen Su

📄 [Paper]: Link

🎈 Project

📖 Educations

2025.08 - now, PhD Student @ College of Computing and Data Science, Nanyang Technological University
2022.09 - 2025.06, Master of Computer Science and Technology, Beijing University of Posts and Telecommunications.
2018.09 - 2022.06, Bachelor of Data Science and Big Data Technology, Beijing University of Posts and Telecommunications.

💻 Internships

2024.12 - 2025.06, A*Star, Singapore.
2024.02 - 2024.12, Tongyi, Alibaba, Beijing, China.
2023.10 - 2024.02, Baichuan Inc., Beijing, China.

🖊️ Blog

🎖 Honors and Awards

2024.10 National master scholarship, Beijing University of Posts and Telecommunications.
2023.10 First-class postgraduate academic scholarship, Beijing University of Posts and Telecommunications.
2022.10 Second-class postgraduate academic scholarship, Beijing University of Posts and Telecommunications.

Zhenhong Zhou (ydyjya)

🔥 News

📝 Selected Publications

🎈 Project

📖 Educations

💻 Internships

🖊️ Blog

🎖 Honors and Awards