Jiaqi Xue

I am a 2nd-year Ph.D. student at the Computer Science Department at University of Central Florida, advised by Prof. Qian Lou. Before that, I obtained my Bachelor’s degree at Chongqing University in 2022.

My research interests lie in the field of machine learning security, particularly in trojan attack/defense for AI models and AI Privacy Protection. Reach out to me over email: jiaqi.xue@ucf.edu

profile photo


  • [Jul. 2024] One paper accepted to ECCV 2024.
  • [Jun. 2024] One paper accepted to PACT 2024.
  • [May. 2024] I joined Samsung Research America as a research intern.
  • [May. 2024] One paper accepted to ACL 2024.
  • [Mar. 2024] One paper accepted to NAACL 2024 (5.3% oral presentation acceptance rate).
  • [Oct. 2023] Received NeurIPS 2023 Scholar Award.
  • [Sep. 2023] One paper accepted to NeurIPS 2023.
  • [Jan. 2023] I joined UCF as a Ph.D. student.
  • [Jun. 2022] I received B.S. from College of Computer Science, Chongqing University. GPA: 3.82/4.0 (top 4%).


(*: Equal contribution)

SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning
Mengxin Zheng*, Jiaqi Xue*, Zihao Wang, Xun Chen, Qian Lou, Lei Jiang, Xiaofeng Wang
ECCV, 2024

SSL-Cleanse is a novel work to detect and mitigate Trojan attacks in SSL encoders without accessing any downstream labels. We evaluated SSL-Cleanse on various datasets using 1200 models, achieving an average detection success rate of 82.2% on ImageNet-100. After mitigating backdoors, on average, backdoored encoders achieve 0.3% attack success rate without great accuracy loss.

CR-UTP: Certified Robustness against Universal Text Perturbations
Qian Lou, Xin Liang*, Jiaqi Xue*, Yancheng Zhang, Rui Xie, Mengxin Zheng
ACL Findings, 2024  

CR-UTP addresses the challenge of certifying language model robustness against Universal Text Perturbations (UTPs) and input-specific text perturbations (ISTPs). We introduce the superior prompt search method and the superior prompt ensembling technique to enhance certified accuracy against UTPs and ISTPs.

TrojFSP: Trojan Insertion in Few-shot Prompt Tuning
Mengxin Zheng, Jiaqi Xue, Xun Chen, YanShan Wang, Qian Lou, Lei Jiang
NAACL, 2024   (Oral Presentation)
pdf / code

TrojFSP addresses the issue of few-shot backdoor attacks, wherein a limited number of token prompts are injected to achieve the backdoor attack objective while maintaining fixed training parameters for the Pre-trained Language Model (PLM).

TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models
Jiaqi Xue, Mengxin Zheng, Ting Hua, Yilin Shen, Yepeng Liu, Ladislau Bölöni, Qian Lou
NeurIPS, 2023
pdf / code / slides / poster

A novel framework for exploring the security vulnerabilities of LLMs, increasingly employed in various tech applications. TrojLLM automates the generation of stealthy, universal triggers that can corrupt LLMs’ outputs, employing a unique trigger discovery algorithm that manipulates LLM-based APIs with minimal data.

Teaching Experience

  • [Jan. 2024 - May 2024] CAP6614 - Current Topics In Machine Learning
  • [Sep. 2023 - Dec. 2023] CDA5106 - Advanced Computer Architecture
  • [May. 2023 - Aug. 2023] CDA3103 - Computer Logic and Organization May

Work Experience

  • [May. 2024 - Now] AI Research Intern, Samsung Research America
  • [Mar. 2022 - Jun. 2022] Machine Learning Intern, Kuaishou Y-tech Lab



  • International Joint Conference on Artificial Intelligence (IJCAI)
  • Neural Information Processing Systems (NeurIPS)


(*: Equal contribution)

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou
Under Review

This paper introduces BadRAG, a novel framework targeting security vulnerabilities in RAG’s retrieval and generative phases. Utilizing contrastive optimization, BadRAG generates adversarial passages activated only by specific triggers. We also explore leveraging LLM alignment to conduct denial-of-service and sentiment steering attacks.

TrojFair: Trojan Fairness Attacks
Mengxin Zheng*, Jiaqi Xue*, Yi Sheng, Lei Yang, Qian Lou, Lei Jiang
Under Review

TrojFair crafts a Trojaned model that functions accurately and equitably for clean inputs. However, it displays discriminatory behaviors - producing both incorrect and unfair results - for specific groups with tainted inputs containing a trigger. TrojFair is a stealthy Fairness attack that is resilient to existing model fairness audition detectors since the model for clean inputs is fair.

Audit and Improve Robustness of Private Neural Networks on Encrypted Data
Jiaqi Xue, Lei Xu, Lin Chen, Weidong Shi, Kaidi Xu, Qian Lou
Under Review

Performing neural network inference on encrypted data without decryption is one popular method to enable privacy-preserving neural networks (PNet) as a service. Compared with regular neural networks deployed for machine-learning-as-a-service, PNet requires additional encoding, e.g., quantized-precision numbers, and polynomial activation. Encrypted input also introduces novel challenges such as adversarial robustness and security. To the best of our knowledge, we are the first to study questions including (i) Whether PNet is more robust against adversarial inputs than regular neural networks? (ii) How to design a robust PNet given the encrypted input without decryption?