publications

2025

  1. In submission to ACL
    redcoder.png
    RedCoder: Automated Multi-Turn Red Teaming for Code LLMs
    Wenjie Jacky Mo, Qin Liu, Xiaofei Wen, Dongwon Jung, Hadi Askari, Wenxuan Zhou, Zhe Zhao, and Muhao Chen
    In Submission to ACL, 2025
  2. NAACL
    test_time_defense.png
    Test-time backdoor mitigation for black-box large language models with defensive demonstrations
    Wenjie Jacky Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Chaowei Xiao, and Muhao Chen
    NAACL 2025, 2025
  3. ICLR
    MuirBench.png
    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
    Fei Wang, Xingyu Fu, James Y Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, and Muhao Chen
    ICLR 2025, 2025
  4. In submission to ACL
    thinkguard.png
    ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails
    Xiaofei Wen, Wenxuan Zhou, Wenjie Jacky Mo, and Muhao Chen
    arXiv preprint arXiv:2502.13458, 2025

2024

  1. AdvML-Frontiers
    backdoor_detect.png
    Rethinking Backdoor Detection Evaluation for Language Models
    Jun Yan, Wenjie Jacky Mo, Xiang Ren, and Robin Jia
    AdvML-Frontiers 2024, 2024
  2. Allerton
    backdoor_survey.png
    Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
    Qin Liu, Wenjie Jacky Mo, Terry Tong, Jiashu Xu, Fei Wang, Chaowei Xiao, and Muhao Chen
    The 60th Annual Allerton Conference, 2024

2023

  1. ACL
    casual_view.png
    A Causal View of Entity Bias in (Large) Language Models
    Fei Wang, Wenjie Jacky Mo, Yiwei Wang, Wenxuan Zhou, and Muhao Chen
    In Findings of the Association for Computational Linguistics: EMNLP 2023, Dec 2023