publications

2025

  1. NAACL
    test_time_defense.png
    Test-time backdoor mitigation for black-box large language models with defensive demonstrations
    Wenjie Jacky Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Chaowei Xiao, and Muhao Chen
    NAACL 2025, 2025
  2. ICLR
    MuirBench.png
    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
    Fei Wang, Xingyu Fu, James Y Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, and Muhao Chen
    ICLR 2025, 2025
  3. In submission to ACL
    thinkguard.png
    ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails
    Xiaofei Wen, Wenxuan Zhou, Wenjie Jacky Mo, and Muhao Chen
    arXiv preprint arXiv:2502.13458, 2025

2024

  1. AdvML-Frontiers
    backdoor_detect.png
    Rethinking Backdoor Detection Evaluation for Language Models
    Jun Yan, Wenjie Jacky Mo, Xiang Ren, and Robin Jia
    AdvML-Frontiers 2024, 2024
  2. Allerton
    backdoor_survey.png
    Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
    Qin Liu, Wenjie Jacky Mo, Terry Tong, Jiashu Xu, Fei Wang, Chaowei Xiao, and Muhao Chen
    The 60th Annual Allerton Conference, 2024

2023

  1. ACL
    casual_view.png
    A Causal View of Entity Bias in (Large) Language Models
    Fei Wang, Wenjie Jacky Mo, Yiwei Wang, Wenxuan Zhou, and Muhao Chen
    In Findings of the Association for Computational Linguistics: EMNLP 2023, Dec 2023