Framework Overview
Overview of SafeGuider. In Step I, SafeGuider processes input prompts through a text encoder to obtain [EOS] token embeddings for safety assessment. Prompts with safety scores > 0.5 are considered safe and directly forwarded to image generation, while unsafe ones (safety scores ≤ 0.5) are processed by Step II. In Step II, SAFE beam search with beam width K strategically modifies unsafe prompts to obtain safe yet semantically meaningful embeddings for image generation.
Abstract
Text-to-image models have shown remarkable capabilities in generating high-quality images from natural language descriptions. However, these models are highly vulnerable to adversarial prompts, which can bypass safety measures and produce harmful content. Despite various defensive strategies, achieving robustness against attacks while maintaining practical utility in real-world applications remains a significant challenge.
To address this issue, we first conduct an empirical study of the text encoder in the Stable Diffusion (SD) model, which is a widely used and representative text-to-image model. Our findings reveal that the [EOS] token acts as a semantic aggregator, exhibiting distinct distributional patterns between benign and adversarial prompts in its embedding space. Building on this insight, we introduce SafeGuider, a two-step framework designed for robust safety control without compromising generation quality.
SafeGuider combines an embedding-level recognition model with a safety-aware feature erasure beam search algorithm. This integration enables the framework to maintain high-quality image generation for benign prompts while ensuring robust defense against both in-domain and out-of-domain attacks. SafeGuider demonstrates exceptional effectiveness in minimizing attack success rates, achieving a maximum rate of only 5.48% across various attack scenarios.
Moreover, instead of refusing to generate or producing black images for unsafe prompts, SafeGuider generates safe and meaningful images, enhancing its practical utility. In addition, SafeGuider is not limited to the SD model and can be effectively applied to other text-to-image models, such as the Flux model, demonstrating its versatility and adaptability across different architectures.
Empirical Study
Identifying the Text Condition Feature Aggregation Token
The [EOS] token serves as a text condition feature aggregator in CLIP's text encoder.
The condition feature aggregation process follows a hierarchical pattern from shallow to deep layers.
Analyzing Embedding Representations in [EOS] Aggregation Token
Visualization
Maximum Mean Discrepancy (MMD)
| Benign | VS Attacks | SJ Attacks | |
|---|---|---|---|
| Benign | 0 | 0.496 | 0.993 |
| VS Attacks | 0.496 | 0 | 1.000 |
| SJ Attacks | 0.993 | 1.000 | 0 |
Prompts within the same category exhibit clear clustering patterns in [EOS] token embedding space.
Prompts across different categories demonstrate significant distributional gaps in [EOS] token embedding space.
Generalization Across Different Text Encoders
To investigate the generality of our findings, we extend our analysis to T2I models with different architectures and text encoders. Beyond the CLIP ViT-L/14 encoder in SD-V1.4, we examine models like SD-V2.1, which uses OpenCLIP ViT-H/14, and Flux.1, which employs both CLIP ViT-L/14 and T5-XXL encoders.
The discovered aggregation token patterns generalize across different text encoders and model architectures.
Visual Comparisons
Comparison of Sexually Explicit Content Mitigation
Comparison of Other Unsafe Content Mitigation
Comparison of Generation Quality on Benign Prompts
Cross-Architecture Performance (SD-V2.1 and Flux.1)
BibTeX
@inproceedings{qi2025safeguider,
title = {SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models},
author = {Peigui Qi and Kunsheng Tang and Wenbo Zhou and Weiming Zhang and
Nenghai Yu and Tianwei Zhang and Qing Guo and Jie Zhang},
booktitle = {Proceedings of the 2025 ACM SIGSAC Conference on Computer and
Communications Security, {CCS} 2025, Taipei, October 13-17, 2025},
publisher = {ACM},
year = {2025},
url = {https://doi.org/10.1145/3719027.3744835},
doi = {10.1145/3719027.3744835}
}