IIIS
Tsinghua University
Beijing, 100084, P.R. China
Course-Correction: Safety Alignment Using Synthetic Preferences
Rongwu Xu*, Yishuo Cai*, Zhenhong Zhou, Renjie Gu, Haiqin Wang, Yan Liu, Tianwei Zhang, Wei Xu, Han
Qiu
arXiv Preprints
[Paper][Code]
MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models
Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu
Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi,
Bailin Wang, Zhijiang Guo, Jiaya Jia
arXiv Preprints
[Paper][Code][Project
Page]
How Alignment and Jailbreak Work: Explain LLM Safety through
Intermediate Hidden States
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li
arXiv Preprints
[Paper][Code]
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu*, Zehan Qi*, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu
arXiv Preprints
[Paper][Code][机器之心][Talk
(Chinese)][Slide]
Preemptive Answer ``Attacks'' on Chain-of-Thought
Reasoning
Rongwu Xu*, Zehan Qi*, Wei Xu
ACL 2024 (Findings) Bangkok, Thailand
[Paper][Code]
The Earth is Flat because...: Investigating LLMs' Belief towards
Misinformation via Persuasive Conversation
Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu,
Han Qiu
ACL 2024 (Oral, Main) Bangkok,
Thailand
[Paper][Code][Project Page][Video] ARR
Meta: 5
Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity
and Bias
Rongwu Xu, Zi'an Zhou, Tianwei Zhang, Zehan Qi, Su Yao, Ke Xu, Wei Xu, Han Qiu
arXiv Preprints
[Paper]
Exploring Chinese Humor Generation: A Study on Two-Part
Allegorical Sayings
Rongwu Xu
IJCNN 2024 Yokohama, Japan
[Paper]
Tempo: Confidentiality Preservation in Cloud-Based Neural
Network Training
Rongwu Xu and Zhixuan Fang
IJCNN 2024 Yokohama, Japan
[Paper]
LSync: A Universal Timeline-synchronizing Solution for Live Streaming
Fan Dang*, Yifan Xu*, Rongwu Xu, Xinlei Chen, Yunhao Liu
IEEE/ACM ToN
[Paper]
MISO:
Legacy-compatible Privacy-preserving Single Sign-on using Trusted Execution Environments
Rongwu Xu, Sen Yang, Fan Zhang, Zhixuan Fang
IEEE EuroS&P 2023 Delft, The Netherlands
[Paper][Project Page] Bachelor
Thesis
LSync:
A Universal Event-synchronizing Solution for Live Streaming
Yifan Xu, Fan Dang, Rongwu Xu, Xinlei Chen, Yunhao Liu
IEEE INFOCOM 2022 Virtual
[Paper]
* denotes equally contribution.
Knowledge Conflicts for LLMs [2-Hour Talk
(Chinese)][Talk Slide]
July 2024
LLMs Safety Vulnerabilities via Contextual Misinformation [3-Min Clip
(English)]
May 2024
Leading Teaching Assistant: Introduction to Large Language Model Applications (Lectured by Prof. Xu
Wei,
Spring
2024)
Check out the [Code] we developed for
the
course. (10+contributers, involing LLM-based App design, inference, efficient fine-tuning,
etc.)
Teaching Assistant: Operating Systems and Distributed Systems (Lectured by Prof. Xu Wei, Fall 2023)
Teaching Assistant: Distributed Systems (Lectured by Prof. Xu Wei and Prof. Fang Zhixuan, Spring 2022)
President@IIIS Graduate Student Union (June 2024 --- Current)
Graduate Freshman Counselor@IIIS (May 2024 --- Current)
Social Practice Captain@IIIS (Apr 2024 --- Jul 2024)
Member@IIIS Graduate Student Union (Sept 2023 --- Jun 2024)
Outstanding individual 2023-2024Duke University, Durham, USA (Jun 2021 --- Aug 2021)
Intern at TongYi Vision Intelligence Lab, Alibaba Inc., Beijing, China (Apr 2024 --- Current)
Intern at Shanghai Qi Zhi Institute, Shanghai, China (Dec 2022 --- Jan 2023)