参考文献¶

Abadi et al., 2016: Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. ACM SIGSAC Conference on Computer and Communications Security.
Abnar & Zuidema, 2020: Abnar, S., & Zuidema, W. (2020). Quantifying attention flow in transformers. Annual Meeting of the Association for Computational Linguistics.
Achiam et al., 2023: Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., … others. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
Adi et al., 2018: Adi, Y., Baum, C., Cisse, M., Pinkas, B., & Keshet, J. (2018). Turning your weakness into a strength: watermarking deep neural networks by backdooring. USENIX Security Symposium.
Alayrac et al., 2019: Alayrac, J.-B., Uesato, J., Huang, P.-S., Fawzi, A., Stanforth, R., & Kohli, P. (2019). Are labels required for improving adversarial robustness? Advances in Neural Information Processing Systems.
An et al., 2024: An, S., Chou, S.-Y., Zhang, K., Xu, Q., Tao, G., Shen, G., … others. (2024). Elijah: eliminating backdoors injected in diffusion models via distribution shift. AAAI Conference on Artificial Intelligence.
Anderberg et al., 2024: Anderberg, A., Bailey, J., Campello, R. J., Houle, M. E., Marques, H. O., Radovanović, M., & Zimek, A. (2024). Dimensionality-aware outlier detection: theoretical and experimental analysis. SIAM International Conference on Data Mining.
Andreina et al., 2021: Andreina, S., Marson, G. A., Möllering, H., & Karame, G. (2021). Baffle: backdoor detection via feedback-based federated learning. International Conference on Distributed Computing Systems.
Andriushchenko et al., 2020: Andriushchenko, M., Croce, F., Flammarion, N., & Hein, M. (2020). Square attack: a query-efficient black-box adversarial attack via random search. European Conference on Computer Vision (pp. 484–501).
Athalye et al., 2018a: Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. International Conference on Machine Learning (pp. 274–283).
Athalye et al., 2018b: Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2018). Synthesizing robust adversarial examples. International Conference on Machine Learning (pp. 284–293).
Bagdasaryan et al., 2020: Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V. (2020). How to backdoor federated learning. International Conference on Artificial Intelligence and Statistics (pp. 2938–2948).
Bai et al., 2024: Bai, Y., Pei, G., Gu, J., Yang, Y., & Ma, X. (2024). Special characters attack: toward scalable training data extraction from large language models. arXiv preprint arXiv:2405.05990.
Bai et al., 2020: Bai, Y., Zeng, Y., Jiang, Y., Xia, S.-T., Ma, X., & Wang, Y. (2020). Improving adversarial robustness via channel-wise activation suppressing. International Conference on Learning Representations.
Bai et al., 2022a: Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., … others. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
Bai et al., 2022b: Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., … others. (2022). Constitutional ai: harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
Ban & Dong, 2022: Ban, Y., & Dong, Y. (2022). Pre-trained adversarial perturbations. Advances in Neural Information Processing Systems.
Bansal et al., 2023: Bansal, H., Singhi, N., Yang, Y., Yin, F., Grover, A., & Chang, K.-W. (2023). Cleanclip: mitigating data poisoning attacks in multimodal contrastive learning. IEEE/CVF International Conference on Computer Vision.
Bao et al., 2023: Bao, F., Nie, S., Xue, K., Li, C., Pu, S., Wang, Y., … Zhu, J. (2023). One transformer fits all distributions in multi-modal diffusion at scale. International Conference on Machine Learning (pp. 1692–1717).
Barreno et al., 2006: Barreno, M., Nelson, B., Sears, R., Joseph, A. D., & Tygar, J. D. (2006). Can machine learning be secure? ACM Symposium on Information, Computer and Communications Security (pp. 16–25).
Bendale & Boult, 2016: Bendale, A., & Boult, T. E. (2016). Towards open set deep networks. IEEE Conference on Computer Vision and Pattern Recognition (pp. 1563–1572).
Biggio et al., 2013: Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., … Roli, F. (2013). Evasion attacks against machine learning at test time. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 387–402).
Biggio et al., 2012: Biggio, B., Nelson, B., & Laskov, P. (2012). Poisoning attacks against support vector machines. International Conference on International Conference on Machine Learning (pp. 1467–1474). Madison, WI, USA: Omnipress.
Blanchard et al., 2017: Blanchard, P., El Mhamdi, E. M., Guerraoui, R., & Stainer, J. (2017). Machine learning with adversaries: byzantine tolerant gradient descent. Advances in Neural Information Processing Systems.
Brendel et al., 2018: Brendel, W., Rauber, J., & Bethge, M. (2018). Decision-based adversarial attacks: reliable attacks against black-box machine learning models. International Conference on Learning Representations.
Brown et al., 2020: Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … others. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems.
Brown et al., 2017: Brown, T. B., Mané, D., Roy, A., Abadi, M., & Gilmer, J. (2017). Adversarial patch.
Cai et al., 2018: Cai, Q.-Z., Liu, C., & Song, D. (2018). Curriculum adversarial training. International Joint Conference on Artificial Intelligence (pp. 3740–3747).
Cao et al., 2021a: Cao, X., Jia, J., & Gong, N. Z. (2021). Ipguard: protecting intellectual property of deep neural networks via fingerprinting the classification boundary. ACM Asia Conference on Computer and Communications Security.
Cao et al., 2021b: Cao, Y., Wang, N., Xiao, C., Yang, D., Fang, J., Yang, R., … Li, B. (2021). Invisible for both camera and lidar: security of multi-sensor fusion based perception in autonomous driving under physical-world attacks. IEEE Symposium on Security and Privacy.
Carlini et al., 2022: Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., & Tramer, F. (2022). Membership inference attacks from first principles. IEEE Symposium on Security and Privacy.
Carlini et al., 2023a: Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., & Zhang, C. (2023). Quantifying memorization across neural language models. International Conference on Learning Representations.
Carlini et al., 2023b: Carlini, N., Nasr, M., Choquette-Choo, C. A., Jagielski, M., Gao, I., Awadalla, A., … others. (2023). Are aligned neural networks adversarially aligned? arXiv:2306.15447.
Carlini & Terzis, 2021: Carlini, N., & Terzis, A. (2021). Poisoning and backdooring contrastive learning. arXiv preprint arXiv:2106.09667.
Carlini et al., 2021: Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., … others. (2021). Extracting training data from large language models. USENIX Security Symposium (pp. 2633–2650).
Carlini & Wagner, 2017a: Carlini, N., & Wagner, D. (2017). Adversarial examples are not easily detected: bypassing ten detection methods. ACM Workshop on Artificial Intelligence and Security (pp. 3–14).
Carlini & Wagner, 2017b: Carlini, N., & Wagner, D. (2017). Magnet and" efficient defenses against adversarial attacks" are not robust to adversarial examples. arXiv preprint arXiv:1711.08478.
Carlini & Wagner, 2017c: Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. IEEE Symposium on Security and Privacy (pp. 39–57).
Carlini et al., 2023c: Carlini, N., Hayes, J., Nasr, M., Jagielski, M., Sehwag, V., Tramer, F., … Wallace, E. (2023). Extracting training data from diffusion models. USENIX Security Symposium.
Carmon et al., 2019: Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J. C., & Liang, P. S. (2019). Unlabeled data improves adversarial robustness. Advances in Neural Information Processing Systems.
Chan et al., 2022: Chan, S.-H., Dong, Y., Zhu, J., Zhang, X., & Zhou, J. (2022). BadDet: Backdoor Attacks on Object Detection.
Chang et al., 2000: Chang, S. G., Yu, B., & Vetterli, M. (2000). Adaptive wavelet thresholding for image denoising and compression. IEEE Transactions on Image Processing, 9(9), 1532–1546.
Chao et al., 2023: Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., & Wong, E. (2023). Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:2310.08419.
Chaudhuri & Monteleoni, 2008: Chaudhuri, K., & Monteleoni, C. (2008). Privacy-preserving logistic regression. Advances in Neural Information Processing Systems.
Chen et al., 2018: Chen, B., Carvalho, W., Baracaldo, N., Ludwig, H., Edwards, B., Lee, T., … Srivastava, B. (2018). Detecting backdoor attacks on deep neural networks by activation clustering.
Chen et al., 2019: Chen, H., Fu, C., Zhao, J., & Koushanfar, F. (2019). Deepinspect: a black-box trojan detection and mitigation framework for deep neural networks. International Joint Conference on Artificial Intelligence (pp. 4658–4664).
Chen et al., 2022a: Chen, J., Wang, J., Peng, T., Sun, Y., Cheng, P., Ji, S., … Song, D. (2022). Copy, right? a testing framework for copyright protection of deep learning models. IEEE Symposium on Security and Privacy.
Chen et al., 2017a: Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., & Hsieh, C.-J. (2017). Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. ACM Workshop on Artificial Intelligence and Security (pp. 15–26).
Chen et al., 2022b: Chen, S., Liu, C., Haque, M., Song, Z., & Yang, W. (2022). Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 1148–1160).
Chen et al., 2021: Chen, T., Zhang, Z., Liu, S., Chang, S., & Wang, Z. (2021). Robust overfitting may be mitigated by properly learned smoothening. International Conference on Learning Representations.
Chen et al., 2020: Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. International Conference on Machine Learning.
Chen et al., 2023: Chen, W., Song, D., & Li, B. (2023). Trojdiff: trojan attacks on diffusion models with diverse targets. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Chen et al., 2017b: Chen, X., Liu, C., Li, B., Lu, K., & Song, D. (2017). Targeted backdoor attacks on deep learning systems using data poisoning.
Chen et al., 2024: Chen, Y., Ma, X., Zou, D., & Jiang, Y.-G. (2024). Extracting training data from unconditional diffusion models. arXiv preprint arXiv:2406.12752.
Cheng et al., 2019: Cheng, M., Le, T., Chen, P.-Y., Zhang, H., Yi, J., & Hsieh, C.-J. (2019). Query-efficient hard-label black-box attack: an optimization-based approach. International Conference on Learning Representation.
Chiang et al., 2023: Chiang, W.-L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang, H., … others. (2023). Vicuna: an open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
Chou et al., 2023: Chou, S.-Y., Chen, P.-Y., & Ho, T.-Y. (2023). How to backdoor diffusion models? IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Chowdhery et al., 2023: Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., … others. (2023). Palm: scaling language modeling with pathways. Journal of Machine Learning Research, 24(240), 1–113.
Christiano et al., 2017: Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems.
Clevert et al., 2016: Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network learning by exponential linear units (elus). International Conference on Learning Representations.
Cordts et al., 2016: Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., … Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Croce & Hein, 2020a: Croce, F., & Hein, M. (2020). Minimally distorted adversarial examples with a fast adaptive boundary attack. International Conference on Machine Learning (pp. 2196–2205).
Croce & Hein, 2020b: Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. International Conference on Machine Learning (pp. 2206–2216).
Crowson, 2022: Crowson, K. (2022). K-Diffusion.
Dai et al., 2023: Dai, W., Li, J., Li, D., Tiong, A. M. H., Zhao, J., Wang, W., … Hoi, S. (2023). InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.
DarvishRouhani et al., 2019: Darvish Rouhani, B., Chen, H., & Koushanfar, F. (2019). Deepsigns: an end-to-end watermarking framework for ownership protection of deep neural networks. International Conference on Architectural Support for Programming Languages and Operating Systems.
Das et al., 2017: Das, N., Shanbhogue, M., Chen, S.-T., Hohman, F., Chen, L., Kounavis, M. E., & Chau, D. H. (2017). Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression.
Deng et al., 2009: Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition.
Devlin et al., 2018: Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Ding et al., 2019: Ding, G. W., Sharma, Y., Lui, K. Y. C., & Huang, R. (2019). Mma training: direct input space margin maximization through adversarial training. International Conference on Learning Representations.
Doan et al., 2023: Doan, K. D., Lao, Y., Yang, P., & Li, P. (2023). Defending backdoor attacks on vision transformer via patch processing. AAAI Conference on Artificial Intelligence.
Dong et al., 2020: Dong, Y., Deng, Z., Pang, T., Zhu, J., & Su, H. (2020). Adversarial distributional training for robust deep learning. Advances in Neural Information Processing Systems (pp. 8270–8283).
Dong et al., 2018: Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., & Li, J. (2018). Boosting adversarial attacks with momentum. IEEE Conference on Computer Vision and Pattern Recognition (pp. 9185–9193).
Dosovitskiy et al., 2021: Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … others. (2021). An image is worth 16x16 words: transformers for image recognition at scale. International Conference on Learning Representations.
Duan et al., 2023: Duan, J., Kong, F., Wang, S., Shi, X., & Xu, K. (2023). Are diffusion models vulnerable to membership inference attacks? International Conference on Machine Learning.
Duan et al., 2020: Duan, R., Ma, X., Wang, Y., Bailey, J., Qin, A. K., & Yang, Y. (2020). Adversarial camouflage: hiding physical-world attacks with natural styles. IEEE Conference on Computer Vision and Pattern Recognition (pp. 1000–1008).
Dwork et al., 2006: Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference.
Ebrahimi et al., 2018: Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2018). Hotflip: white-box adversarial examples for text classification. Annual Meeting of the Association for Computational Linguistics (pp. 31–36).
Elman, 1990: Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.
Eykholt et al., 2018: Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., … Song, D. (2018). Robust physical-world attacks on deep learning visual classification. IEEE Conference on Computer Vision and Pattern Recognition.
Feinman et al., 2017: Feinman, R., Curtin, R. R., Shintre, S., & Gardner, A. B. (2017). Detecting adversarial samples from artifacts.
Feng et al., 2019: Feng, J., Cai, Q.-Z., & Zhou, Z.-H. (2019). Learning to confuse: generating training time adversarial data with auto-encoder. Advances in Neural Information Processing Systems.
Fredrikson et al., 2015: Fredrikson, M., Jha, S., & Ristenpart, T. (2015). Model inversion attacks that exploit confidence information and basic countermeasures. ACM SIGSAC Conference on Computer and Communications Security (pp. 1322–1333).
Frosst et al., 2019: Frosst, N., Papernot, N., & Hinton, G. (2019). Analyzing and improving representations with the soft nearest neighbor loss. International Conference on Machine Learning.
Fu et al., 2022: Fu, Y., Zhang, S., Wu, S., Wan, C., & Lin, Y. (2022). Patch-fool: are vision transformers always robust against adversarial perturbations? International Conference on Learning Representations.
Fung et al., 2018: Fung, C., Yoon, C. J., & Beschastnikh, I. (2018). Mitigating sybils in federated learning poisoning.
Gailly & Adler, 2004: Gailly, J.-l., & Adler, M. (2004). Zlib compression library.
Gal & Ghahramani, 2016: Gal, Y., & Ghahramani, Z. (2016). A theoretically grounded application of dropout in recurrent neural networks. Advances in Neural Information Processing Systems.
Gan et al., 2020: Gan, Z., Chen, Y.-C., Li, L., Zhu, C., Cheng, Y., & Liu, J. (2020). Large-scale adversarial training for vision-and-language representation learning. Advances in Neural Information Processing Systems (pp. 6616–6628).
Geiping et al., 2021: Geiping, J., Fowl, L. H., Huang, W. R., Czaja, W., Taylor, G., Moeller, M., & Goldstein, T. (2021). Witches' brew: industrial scale data poisoning via gradient matching. International Conference on Learning Representations.
Goldblum et al., 2020: Goldblum, M., Fowl, L., Feizi, S., & Goldstein, T. (2020). Adversarially robust distillation. AAAI Conference on Artificial Intelligence (pp. 3996–4003).
Gong et al., 2023: Gong, Y., Ran, D., Liu, J., Wang, C., Cong, T., Wang, A., … Wang, X. (2023). Figstep: jailbreaking large vision-language models via typographic visual prompts. arXiv:2311.05608.
Gong et al., 2017: Gong, Z., Wang, W., & Ku, W.-S. (2017). Adversarial and clean data are not twins.
Goodfellow et al., 2015: Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. International Conference on Learning Representations.
Gowal et al., 2021: Gowal, S., Rebuffi, S.-A., Wiles, O., Stimberg, F., Calian, D. A., & Mann, T. A. (2021). Improving robustness using generated data. Advances in Neural Information Processing Systems.
Goyal et al., 2020: Goyal, S., Choudhury, A. R., Raje, S. M., Chakaravarthy, V. T., Sabharwal, Y., & Verma, A. (2020). Power-bert: accelerating bert inference via progressive word-vector elimination. International Conference on Machine Learning.
Greshake et al., 2023: Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). More than you've asked for: a comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv e-prints, pp. arXiv–2302.
Gretton et al., 2012: Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13(1), 723–773.
Grosse et al., 2017: Grosse, K., Manoharan, P., Papernot, N., Backes, M., & McDaniel, P. (2017). On the (statistical) detection of adversarial examples.
Gu et al., 2022: Gu, J., Tresp, V., & Qin, Y. (2022). Are vision transformers robust to patch perturbations? European Conference on Computer Vision.
Gu et al., 2017: Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). Badnets: Identifying vulnerabilities in the machine learning model supply chain.
Gu et al., 2023: Gu, X., Du, C., Pang, T., Li, C., Lin, M., & Wang, Y. (2023). On memorization in diffusion models. arXiv preprint arXiv:2310.02664.
Guan et al., 2022: Guan, Y., Li, Z., Leng, J., Lin, Z., & Guo, M. (2022). Transkimmer: transformer learns to layer-wise skim. Annual Meeting of the Association for Computational Linguistics (pp. 7275–7286).
Guan et al., 2024: Guan, Z., Hu, M., Li, S., & Vullikanti, A. (2024). Ufid: a unified framework for input-level backdoor detection on diffusion models. arXiv preprint arXiv:2404.01101.
Guo et al., 2017: Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. International Conference on Machine Learning.
Guo et al., 2023: Guo, J., Li, J., Li, D., Tiong, A. M. H., Li, B., Tao, D., & Hoi, S. (2023). From images to textual prompts: zero-shot visual question answering with frozen large language models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10867–10877).
Guo et al., 2019: Guo, W., Wang, L., Xing, X., Du, M., & Song, D. (2019). Tabor: A highly accurate approach to inspecting and restoring trojan backdoors in ai systems.
Gupta & Rahtu, 2019: Gupta, P., & Rahtu, E. (2019). Ciidefence: defeating adversarial attacks by fusing class-specific image inpainting and image denoising. IEEE International Conference on Computer Vision (pp. 6708–6717).
Hampel, 1974: Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69(346), 383–393.
Hao et al., 2024: Hao, Y., Yang, W., & Lin, Y. (2024). Exploring backdoor vulnerabilities of chat models. arXiv preprint arXiv:2404.02406.
He et al., 2022a: He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
He et al., 2016: He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
He et al., 2022b: He, X., Xu, Q., Lyu, L., Wu, F., & Wang, C. (2022). Protecting intellectual property of language generation apis with lexical watermark. AAAI Conference on Artificial Intelligence.
He et al., 2019: He, Z., Zhang, T., & Lee, R. B. (2019). Model inversion attacks against collaborative inference. Annual Computer Security Applications Conference.
Hendrycks & Gimpel, 2016a: Hendrycks, D., & Gimpel, K. (2016). Early methods for detecting adversarial images.
Hendrycks & Gimpel, 2016b: Hendrycks, D., & Gimpel, K. (2016). Gaussian error linear units (gelus).
Hintersdorf et al., 2024: Hintersdorf, D., Struppek, L., Kersting, K., Dziedzic, A., & Boenisch, F. (2024). Finding nemo: localizing neurons responsible for memorization in diffusion models. arXiv preprint arXiv:2406.02366.
Hinton et al., 2015: Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
Ho et al., 2020: Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems.
Houle, 2017: Houle, M. E. (2017). Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. International Conference on Similarity Search and Applications.
Hu et al., 2022: Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., … Chen, W. (2022). LoRA: low-rank adaptation of large language models. International Conference on Learning Representations.
Hu et al., 2019: Hu, S., Yu, T., Guo, C., Chao, W.-L., & Weinberger, K. Q. (2019). A new defense against adversarial images: turning a weakness into a strength. Advances in Neural Information Processing Systems.
Hua et al., 2024: Hua, A., Gu, J., Xue, Z., Carlini, N., Wong, E., & Qin, Y. (2024). Initialization matters for adversarial transfer learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 24831–24840).
Huang et al., 2023a: Huang, B., Wang, Z., Yang, J., Ai, J., Zou, Q., Wang, Q., & Ye, D. (2023). Implicit identity driven deepfake face swapping detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Huang et al., 2023b: Huang, H., Ma, X., Erfani, S., & Bailey, J. (2023). Distilling cognitive backdoor patterns within an image. International Conference on Learning Representations.
Huang et al., 2020: Huang, W. R., Geiping, J., Fowl, L., Taylor, G., & Goldstein, T. (2020). Metapoison: practical general-purpose clean-label data poisoning. Advances in Neural Information Processing Systems (pp. 12080–12091).
Ilyas et al., 2018: Ilyas, A., Engstrom, L., Athalye, A., & Lin, J. (2018). Black-box adversarial attacks with limited queries and information. International Conference on Machine Learning (pp. 2137–2146).
Ishihara, 2023: Ishihara, S. (2023). Training data extraction from pre-trained language models: a survey. arXiv preprint arXiv:2305.16157.
Izmailov et al., 2018: Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., & Wilson, A. G. (2018). Averaging weights leads to wider optima and better generalization. Conference on Uncertainty in Artificial Intelligence.
Jia et al., 2021: Jia, H., Choquette-Choo, C. A., Chandrasekaran, V., & Papernot, N. (2021). Entangled watermarks as a defense against model extraction. USENIX Security Symposium.
Jia et al., 2022a: Jia, J., Liu, Y., & Gong, N. Z. (2022). Badencoder: backdoor attacks to pre-trained encoders in self-supervised learning. IEEE Symposium on Security and Privacy.
Jia et al., 2022b: Jia, M., Tang, L., Chen, B.-C., Cardie, C., Belongie, S., Hariharan, B., & Lim, S.-N. (2022). Visual prompt tuning. European Conference on Computer Vision.
Jia et al., 2019: Jia, X., Wei, X., Cao, X., & Foroosh, H. (2019). Comdefend: an efficient image compression model to defend adversarial examples. IEEE Conference on Computer Vision and Pattern Recognition (pp. 6084–6092).
Jiang et al., 2023: Jiang, Y., Chan, C., Chen, M., & Wang, W. (2023). Lion: adversarial distillation of proprietary large language models. Conference on Empirical Methods in Natural Language Processing (pp. 3134–3154).
Jin et al., 2019: Jin, G., Shen, S., Zhang, D., Dai, F., & Zhang, Y. (2019). Ape-gan: adversarial perturbation elimination with gan. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 3842–3846).
Kang et al., 2023: Kang, M., Zhu, J.-Y., Zhang, R., Park, J., Shechtman, E., Paris, S., & Park, T. (2023). Scaling up gans for text-to-image synthesis. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Kearns & Li, 1993: Kearns, M., & Li, M. (1993). Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4), 807–837.
Kim & Cho, 2021: Kim, G., & Cho, K. (2021). Length-adaptive transformer: train once with length drop, use anytime with search. Joint Conference of Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing.
Kim et al., 2022: Kim, S., Shen, S., Thorsley, D., Gholami, A., Kwon, W., Hassoun, J., & Keutzer, K. (2022). Learned token pruning for transformers. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 784–794).
Kirchenbauer et al., 2023: Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. International Conference on Machine Learning.
Koh et al., 2022: Koh, P. W., Steinhardt, J., & Liang, P. (2022). Stronger data poisoning attacks break data sanitization defenses. Machine Learning, 111(1), 1–47.
Kruger et al., 2004: Kruger, L. E., Wohler, C., Wurz-Wessel, A., & Stein, F. (2004). In-factory calibration of multiocular camera systems. Optical Metrology in Production Engineering.
Kumar et al., 2020: Kumar, R. S. S., Nyström, M., Lambert, J., Marshall, A., Goertzel, M., Comissoneru, A., … Xia, S. (2020). Adversarial machine learning-industry perspectives. IEEE Security and Privacy Workshops (pp. 69–75).
Kurakin et al., 2016: Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale.
Kurakin et al., 2018: Kurakin, A., Goodfellow, I. J., & Bengio, S. (2018). Adversarial examples in the physical world. Artificial Intelligence Safety and Security (pp. 99–112). Chapman and Hall/CRC.
LeMerrer et al., 2020: Le Merrer, E., Perez, P., & Trédan, G. (2020). Adversarial frontier stitching for remote neural network watermarking. Neural Computing and Applications, 32(13), 9233–9244.
Lee et al., 2022: Lee, K., Ippolito, D., Nystrom, A., Zhang, C., Eck, D., Callison-Burch, C., & Carlini, N. (2022). Deduplicating training data makes language models better. Annual Meeting of the Association for Computational Linguistics.
Lee et al., 2018: Lee, K., Lee, K., Lee, H., & Shin, J. (2018). A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in Neural Information Processing Systems.
Li et al., 2024a: Li, H., Chen, Y., Zheng, Z., Hu, Q., Chan, C., Liu, H., & Song, Y. (2024). Backdoor removal for generative large language models. arXiv preprint arXiv:2405.07667.
Li et al., 2023a: Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. International conference on machine learning (pp. 19730–19742).
Li et al., 2022: Li, J., Li, D., Xiong, C., & Hoi, S. (2022). Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation. International conference on machine learning (pp. 12888–12900).
Li et al., 2021a: Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., & Hoi, S. C. H. (2021). Align before fuse: vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34, 9694–9705.
Li et al., 2020a: Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020). Face x-ray for more general face forgery detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Li et al., 2020b: Li, L., Ma, R., Guo, Q., Xue, X., & Qiu, X. (2020). Bert-attack: adversarial attack against bert using bert. Conference on Empirical Methods in Natural Language Processing (pp. 6193–6202).
Li et al., 2024b: Li, Q., Wang, W., Xu, C., Sun, Z., & Yang, M.-H. (2024). Learning disentangled representation for one-shot progressive face swapping. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Li et al., 2020c: Li, S., Cheng, Y., Wang, W., Liu, Y., & Chen, T. (2020). Learning to detect malicious clients for robust federated learning. arXiv preprint arXiv:2002.00211.
Li et al., 2024c: Li, W., Chen, P.-Y., Liu, S., & Wang, R. (2024). Psbd: prediction shift uncertainty unlocks backdoor detection. arXiv preprint arXiv:2406.05826.
Li et al., 2021b: Li, Y., Yang, Z., Wang, Y., & Xu, C. (2021). Neural architecture dilation for adversarial robustness. Advances in Neural Information Processing Systems (pp. 29578–29589).
Li et al., 2021c: Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., & Ma, X. (2021). Anti-backdoor learning: training clean models on poisoned data. Advances in Neural Information Processing Systems (pp. 14900–14912).
Li et al., 2021d: Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., & Ma, X. (2021). Anti-backdoor learning: training clean models on poisoned data. Advances in Neural Information Processing Systems.
Li et al., 2023b: Li, Y., Lyu, X., Ma, X., Koren, N., Lyu, L., Li, B., & Jiang, Y.-G. (2023). Reconstructive neuron pruning for backdoor defense. International Conference on Machine Learning.
Li et al., 2024d: Li, Y., Ma, X., He, J., Huang, H., & Jiang, Y.-G. (2024). Multi-trigger backdoor attacks: more triggers, more threats. arXiv preprint arXiv:2401.15295.
Li et al., 2021e: Li, Y., Li, Y., Wu, B., Li, L., He, R., & Lyu, S. (2021). Invisible backdoor attack with sample-specific triggers. IEEE International Conference on Computer Vision (pp. 16463–16472).
Li et al., 2024e: Li, Z., Wang, C., Ma, P., Liu, C., Wang, S., Wu, D., … Liu, Y. (2024). On extracting specialized code abilities from large language models: a feasibility study. IEEE/ACM International Conference on Software Engineering.
Liang et al., 2024: Liang, S., Zhu, M., Liu, A., Wu, B., Cao, X., & Chang, E.-C. (2024). Badclip: dual-embedding guided backdoor attack on multimodal contrastive learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Liao et al., 2018: Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., & Zhu, J. (2018). Defense against adversarial attacks using high-level representation guided denoiser. IEEE Conference on Computer Vision and Pattern Recognition (pp. 1778–1787).
Lin et al., 2014: Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., … Zitnick, C. L. (2014). Microsoft coco: common objects in context. European Conference on Computer Vision.
Liu et al., 2023: Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning. Advances in Neural Information Processing Systems.
Liu et al., 2024: Liu, H., Reiter, M. K., & Gong, N. Z. (2024). Mudjacking: patching backdoor vulnerabilities in foundation models. arXiv preprint arXiv:2402.14977.
Liu et al., 2018a: Liu, K., Dolan-Gavitt, B., & Garg, S. (2018). Fine-pruning: defending against backdooring attacks on deep neural networks. International Symposium on Research in Attacks, Intrusions, and Defenses (pp. 273–294).
Liu et al., 2020: Liu, X., Cheng, H., He, P., Chen, W., Wang, Y., Poon, H., & Gao, J. (2020). Adversarial training for large neural language models. arXiv preprint arXiv:2004.08994.
Liu et al., 2017: Liu, Y., Chen, X., Liu, C., & Song, D. (2017). Delving into transferable adversarial examples and black-box attacks.
Liu et al., 2018b: Liu, Y., Ma, S., Aafer, Y., Lee, W.-C., Zhai, J., Wang, W., & Zhang, X. (2018). Trojaning attack on neural networks. Network and Distributed Systems Security Symposium.
Lorenz et al., 2022: Lorenz, P., Keuper, M., & Keuper, J. (2022). Unfolding local growth rate estimates for (almost) perfect adversarial detection. International Conference on Computer Vision Theory and Applications.
Lu et al., 2023: Lu, D., Wang, Z., Wang, T., Guan, W., Gao, H., & Zheng, F. (2023). Set-level guidance attack: boosting adversarial transferability of vision-language pre-training models. IEEE/CVF International Conference on Computer Vision (pp. 102–111).
Lu et al., 2022: Lu, P., Mishra, S., Xia, T., Qiu, L., Chang, K.-W., Zhu, S.-C., … Kalyan, A. (2022). Learn to explain: multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems.
Lukas et al., 2021: Lukas, N., Zhang, Y., & Kerschbaum, F. (2021). Deep neural network fingerprinting by conferrable adversarial examples.
Luo et al., 2024: Luo, H., Gu, J., Liu, F., & Torr, P. (2024). An image is worth 1000 lies: adversarial transferability across prompts on vision-language models. arXiv:2403.09766.
Lv et al., 2021: Lv, P., Ma, H., Zhou, J., Liang, R., Chen, K., Zhang, S., & Yang, Y. (2021). Dbia: data-free backdoor injection attack against transformer networks. arXiv preprint arXiv:2111.11870.
Ma et al., 2023: Ma, H., Qiu, H., Gao, Y., Zhang, Z., Abuadbba, A., Xue, M., … Abbott, D. (2023). Quantization backdoors to deep learning commercial frameworks. IEEE Transactions on Dependable and Secure Computing.
Ma et al., 2024: Ma, J., Cao, A., Xiao, Z., Zhang, J., Ye, C., & Zhao, J. (2024). Jailbreaking prompt attack: a controllable adversarial attack against diffusion models. arXiv:2404.02928.
Ma et al., 2018: Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S., Schoenebeck, G., … Bailey, J. (2018). Characterizing adversarial subspaces using local intrinsic dimensionality. International Conference on Learning Representations.
Madry et al., 2018: Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations.
Mahalanobis, 1936: Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences, 2, 49–55.
Mahendran & Vedaldi, 2015: Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. IEEE Conference on Computer Vision and Pattern Recognition.
Mahendran & Vedaldi, 2016: Mahendran, A., & Vedaldi, A. (2016). Visualizing deep convolutional neural networks using natural pre-images. International Journal of Computer Vision, 120, 233–255.
Mahloujifar & Mahmoody, 2017: Mahloujifar, S., & Mahmoody, M. (2017). Blockwise p-tampering attacks on cryptographic primitives, extractors, and learners. Theory of Cryptography Conference (pp. 245–279).
Mahloujifar et al., 2019: Mahloujifar, S., Mahmoody, M., & Mohammed, A. (2019). Universal multi-party poisoning attacks. International Conference on Machine Learning (pp. 4274–4283).
Mahmood et al., 2021: Mahmood, K., Mahmood, R., & Van Dijk, M. (2021). On the robustness of vision transformers to adversarial examples. IEEE International Conference on Computer Vision.
Mao et al., 2023: Mao, C., Geng, S., Yang, J., Wang, X., & Vondrick, C. (2023). Understanding zero-shot adversarial robustness for large-scale models. International Conference on Learning Representations.
Masood et al., 2023: Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A., & Malik, H. (2023). Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward. Applied Intelligence, 53(4), 3974–4026.
Mattern et al., 2023: Mattern, J., Mireshghallah, F., Jin, Z., Schoelkopf, B., Sachan, M., & Berg-Kirkpatrick, T. (2023). Membership inference attacks against language models via neighbourhood comparison. Annual Meeting of The Association For Computational Linguistics.
McMahan et al., 2017: McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Artificial Intelligence and Statistics.
Mei & Zhu, 2015: Mei, S., & Zhu, X. (2015). Using machine teaching to identify optimal training-set attacks on machine learners. AAAI Conference on Artificial Intelligence.
Meng & Chen, 2017: Meng, D., & Chen, H. (2017). Magnet: a two-pronged defense against adversarial examples. ACM SIGSAC Conference on Computer and Communications Security (pp. 135–147).
Metzen et al., 2017: Metzen, J. H., Genewein, T., Fischer, V., & Bischoff, B. (2017). On detecting adversarial perturbations. International Conference on Learning Representations.
Micikevicius et al., 2018: Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., … others. (2018). Mixed precision training. International Conference on Learning Representations.
Miyato et al., 2018: Miyato, T., Maeda, S.-i., Koyama, M., & Ishii, S. (2018). Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1979–1993.
Mo et al., 2024: Mo, Y., Huang, H., Li, M., Li, A., & Wang, Y. (2024). Terd: a unified framework for safeguarding diffusion models against backdoors. International Conference on Machine Learning.
Moosavi-Dezfooli et al., 2016: Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016). Deepfool: a simple and accurate method to fool deep neural networks. IEEE Conference on Computer Vision and Pattern Recognition (pp. 2574–2582).
Mordvintsev et al., 2015: Mordvintsev, A., Olah, C., & Tyka, M. (2015). Inceptionism: going deeper into neural networks.
Munoz-Gonzalez et al., 2019: Muñoz-González, L., Pfitzner, B., Russo, M., Carnerero-Cano, J., & Lupu, E. C. (2019). Poisoning attacks with generative adversarial nets.
Nair & Hinton, 2010: Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. International Conference on Machine Learning.
Naseer et al., 2021: Naseer, M., Ranasinghe, K., Khan, S., Khan, F. S., & Porikli, F. (2021). On improving adversarial transferability of vision transformers. arXiv preprint arXiv:2106.04169.
Naseh et al., 2023: Naseh, A., Roh, J., & Houmansadr, A. (2023). Memory triggers: unveiling memorization in text-to-image generative models through word-level duplication. arXiv preprint arXiv:2312.03692.
Nasr et al., 2023: Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., … Lee, K. (2023). Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035.
Nelson et al., 2008: Nelson, B., Barreno, M., Chi, F. J., Joseph, A. D., Rubinstein, B. I., Saini, U., … Xia, K. (2008). Exploiting machine learning to subvert your spam filter. LEET, 8(1), 9.
Nguyen et al., 2017: Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., & Yosinski, J. (2017). Plug & play generative networks: conditional iterative generation of images in latent space. IEEE Conference on Computer Vision and Pattern Recognition.
Nguyen et al., 2016: Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., & Clune, J. (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Advances in Neural Information Processing systems, 29.
Nguyen & Tran, 2020: Nguyen, T. A., & Tran, A. (2020). Input-aware dynamic backdoor attack. Advances in Neural Information Processing Systems (pp. 3454–3464).
Nie et al., 2022: Nie, W., Guo, B., Huang, Y., Xiao, C., Vahdat, A., & Anandkumar, A. (2022). Diffusion models for adversarial purification. International Conference on Machine Learning (pp. 16805–16827).
Nirkin et al., 2019: Nirkin, Y., Keller, Y., & Hassner, T. (2019). FSGAN: subject agnostic face swapping and reenactment. IEEE International Conference on Computer Vision.
Noever & Noever, 2021: Noever, D. A., & Noever, S. E. M. (2021). Reading isn't believing: adversarial attacks on multi-modal neurons. arXiv:2103.10480.
Oh et al., 2019: Oh, S. J., Schiele, B., & Fritz, M. (2019). Towards reverse-engineering black-box neural networks. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (pp. 121–144). Springer.
Ooms, 2024: Ooms, J. (2024). cld3: Google's Compact Language Detector 3. R package version 1.6.0. URL: https://docs.ropensci.org/cld3/ https://github.com/ropensci/cld3 https://ropensci.r-universe.dev/cld3
Oord et al., 2018: Oord, A. v. d., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding.
OpenAI, 2024: OpenAI (2024). ChatGPT. Accessed: 2024-07-23.
Paperno et al., 2016: Paperno, D., Kruszewski, G., Lazaridou, A., Pham, Q. N., Bernardi, R., Pezzelle, S., … Fernández, R. (2016). The LAMBADA dataset: Word prediction requiring a broad discourse context.
Papernot et al., 2017: Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017). Practical black-box attacks against machine learning. ACM on Asia Conference on Computer and Communications Security (pp. 506–519).
Papernot et al., 2016: Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B., & Swami, A. (2016). The limitations of deep learning in adversarial settings. IEEE European Symposium on Security and Privacy (pp. 372–387).
Papineni et al., 2002: Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. Annual Meeting of the Association for Computational Linguistics.
Peters et al., 2018: Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Pinto et al., 2024: Pinto, F., Rauschmayr, N., Tramèr, F., Torr, P., & Tombari, F. (2024). Extracting training data from document-based vqa models. arXiv preprint arXiv:2407.08707.
Prakash et al., 2018: Prakash, A., Moran, N., Garber, S., DiLillo, A., & Storer, J. (2018). Deflecting adversarial attacks with pixel deflection. IEEE Conference on Computer Vision and Pattern Recognition (pp. 8571–8580).
Pruthi et al., 2019: Pruthi, D., Dhingra, B., & Lipton, Z. C. (2019). Combating adversarial misspellings with robust word recognition. Annual Meeting of the Association for Computational Linguistics (pp. 5582–5591).
Qi et al., 2023: Qi, X., Huang, K., Panda, A., Wang, M., & Mittal, P. (2023). Visual adversarial examples jailbreak large language models. arXiv:2306.13213.
Qian et al., 2020: Qian, Y., Yin, G., Sheng, L., Chen, Z., & Shao, J. (2020). Thinking in frequency: face forgery detection by mining frequency-aware clues. European Conference on Computer Vision.
Qin et al., 2019: Qin, C., Martens, J., Gowal, S., Krishnan, D., Dvijotham, K., Fawzi, A., … Kohli, P. (2019). Adversarial robustness through local linearization. Advances in Neural Information Processing Systems.
Radford et al., 2021: Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … others. (2021). Learning transferable visual models from natural language supervision. International Conference on Machine Learning.
Radford et al., 2018: Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., & others. (2018). Improving language understanding by generative pre-training.
Rafailov et al., 2024: Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2024). Direct preference optimization: your language model is secretly a reward model. Advances in Neural Information Processing Systems.
Ramachandran et al., 2017: Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions.
Ramesh et al., 2022: Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
Rebuffi et al., 2021a: Rebuffi, S.-A., Gowal, S., Calian, D. A., Stimberg, F., Wiles, O., & Mann, T. (2021). Fixing data augmentation to improve adversarial robustness.
Rebuffi et al., 2021b: Rebuffi, S.-A., Gowal, S., Calian, D. A., Stimberg, F., Wiles, O., & Mann, T. A. (2021). Data augmentation can improve robustness. Advances in Neural Information Processing Systems.
Rice et al., 2020: Rice, L., Wong, E., & Kolter, Z. (2020). Overfitting in adversarially robust deep learning. International Conference on Machine Learning.
Robey et al., 2023: Robey, A., Wong, E., Hassani, H., & Pappas, G. J. (2023). Smoothllm: defending large language models against jailbreaking attacks. arXiv preprint arXiv:2310.03684.
Rombach et al., 2022: Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Ronneberger et al., 2015: Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer Assisted Intervention (pp. 234–241).
Roth et al., 2019: Roth, K., Kilcher, Y., & Hofmann, T. (2019). The odds are odd: a statistical test for detecting adversarial examples. International Conference on Machine Learning (pp. 5498–5507).
Saha et al., 2020: Saha, A., Subramanya, A., & Pirsiavash, H. (2020). Hidden trigger backdoor attacks. AAAI Conference on Artificial Intelligence.
Sakaguchi et al., 2017: Sakaguchi, K., Duh, K., Post, M., & Van Durme, B. (2017). Robsut wrod reocginiton via semi-character recurrent neural network. AAAI Conference on Artificial Intelligence.
Samangouei et al., 2018: Samangouei, P., Kabkab, M., & Chellappa, R. (2018). Defense-gan: protecting classifiers against adversarial attacks using generative models. International Conference on Learning Representations.
Schlarmann et al., 2024: Schlarmann, C., Singh, N. D., Croce, F., & Hein, M. (2024). Robust clip: unsupervised adversarial fine-tuning of vision embeddings for robust large vision-language models. International Conference on Machine Learning.
Schubert et al., 2014: Schubert, E., Zimek, A., & Kriegel, H.-P. (2014). Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data mining and knowledge discovery, 28, 190–237.
Schuhmann et al., 2022: Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C. W., Wightman, R., Cherti, M., … Jitsev, J. (2022). LAION-5b: an open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems.
Schulman et al., 2017: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Selvaraju et al., 2017: Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: visual explanations from deep networks via gradient-based localization. IEEE International Conference on Computer Vision.
Sennrich et al., 2016: Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. Annual Meeting of the Association for Computational Linguistics.
Sha et al., 2023: Sha, Z., He, X., Yu, N., Backes, M., & Zhang, Y. (2023). Can't steal? cont-steal! contrastive stealing attacks against image encoders. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Shafahi et al., 2018: Shafahi, A., Huang, W. R., Najibi, M., Suciu, O., Studer, C., Dumitras, T., & Goldstein, T. (2018). Poison frogs! targeted clean-label poisoning attacks on neural networks. Advances in Neural Information Processing Systems.
Shafahi et al., 2019: Shafahi, A., Najibi, M., Ghiasi, M. A., Xu, Z., Dickerson, J., Studer, C., … Goldstein, T. (2019). Adversarial training for free! Advances in Neural Information Processing Systems.
Shao et al., 2022: Shao, R., Shi, Z., Yi, J., Chen, P.-Y., & Hsieh, C.-J. (2022). On the adversarial robustness of vision transformers. Transactions on Machine Learning Research.
Sharif et al., 2016: Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M. K. (2016). Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. ACM SIGSAC Conference on Computer and Communications Security.
Sharma et al., 2018: Sharma, P., Ding, N., Goodman, S., & Soricut, R. (2018). Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning. Annual Meeting of the Association for Computational Linguistics.
Shayegani et al., 2023: Shayegani, E., Dong, Y., & Abu-Ghazaleh, N. (2023). Plug and pray: exploiting off-the-shelf components of multi-modal models. arXiv:2307.14539.
Shen et al., 2016: Shen, S., Tople, S., & Saxena, P. (2016). Auror: defending against poisoning attacks in collaborative deep learning systems. Conference on Computer Security Applications.
Shen & Sanghavi, 2019: Shen, Y., & Sanghavi, S. (2019). Learning with bad training data via iterative trimmed loss minimization. International Conference on Machine Learning (pp. 5739–5748).
Shi et al., 2022: Shi, Y., Han, Y., Tan, Y.-a., & Kuang, X. (2022). Decision-based black-box attack against vision transformers via patch-wise adversarial removal. Advances in Neural Information Processing Systems.
Shin et al., 2020: Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). Autoprompt: eliciting knowledge from language models with automatically generated prompts. Conference on Empirical Methods in Natural Language Processing.
Shokri et al., 2017: Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. IEEE Symposium on Security and Privacy.
Smith & Topin, 2019: Smith, L. N., & Topin, N. (2019). Super-convergence: very fast training of residual networks using large learning rates. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications (pp. 369–386).
Smith, 2007: Smith, R. (2007). An overview of the tesseract ocr engine. International Conference on Document Analysis and Recognition.
Somepalli et al., 2022: Somepalli, G., Singla, V., Goldblum, M., Geiping, J., & Goldstein, T. (2022). Diffusion art or digital forgery? Investigating data replication in diffusion models. arXiv preprint arXiv:2212.03860.
Somepalli et al., 2023: Somepalli, G., Singla, V., Goldblum, M., Geiping, J., & Goldstein, T. (2023). Understanding data replication in diffusion models. International Conference on Machine Learning WorkShop.
Song et al., 2020: Song, J., Meng, C., & Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
Song et al., 2013: Song, S., Chaudhuri, K., & Sarwate, A. D. (2013). Stochastic gradient descent with differentially private updates. IEEE Global Conference on Signal and Information Processing.
Sorokin & Forsyth, 2008: Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Srivastava et al., 2014: Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
Subramanya et al., 2024: Subramanya, A., Koohpayegani, S. A., Saha, A., Tejankar, A., & Pirsiavash, H. (2024). A closer look at robustness of vision transformers to backdoor attacks. IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 3874–3883).
Subramanya et al., 2022: Subramanya, A., Saha, A., Koohpayegani, S. A., Tejankar, A., & Pirsiavash, H. (2022). Backdoor attacks on vision transformers. arXiv preprint arXiv:2206.08477.
Sun et al., 2023: Sun, X., Li, X., Meng, Y., Ao, X., Lyu, L., Li, J., & Zhang, T. (2023). Defending against backdoor attacks in natural language generation. AAAI Conference on Artificial Intelligence.
Sun et al., 2019: Sun, Z., Kairouz, P., Suresh, A. T., & McMahan, H. B. (2019). Can you really backdoor federated learning?
Sur et al., 2023: Sur, I., Sikka, K., Walmer, M., Koneripalli, K., Roy, A., Lin, X., … Jha, S. (2023). Tijo: trigger inversion with joint optimization for defending multimodal backdoored models. IEEE/CVF International Conference on Computer Vision.
Szegedy et al., 2014: Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. International Conference on Learning Representations.
Szyller et al., 2021: Szyller, S., Atli, B. G., Marchal, S., & Asokan, N. (2021). Dawn: dynamic adversarial watermarking of neural networks. ACM International Conference on Multimedia.
Tan & Le, 2019: Tan, M., & Le, Q. (2019). Efficientnet: rethinking model scaling for convolutional neural networks. International Conference on Machine Learning (pp. 6105–6114).
Tang et al., 2020: Tang, R., Du, M., Liu, N., Yang, F., & Hu, X. (2020). An embarrassingly simple approach for trojan attack in deep neural networks. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 218–228).
Taori et al., 2023: Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., … Hashimoto, T. B. (2023). Alpaca: a strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm.stanford.edu/2023/03/13/alpaca.html, 3(6), 7.
Tejankar et al., 2023: Tejankar, A., Sanjabi, M., Wang, Q., Wang, S., Firooz, H., Pirsiavash, H., & Tan, L. (2023). Defending against patch-based backdoor attacks on self-supervised learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Thies et al., 2016: Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: real-time face capture and reenactment of rgb videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Tian et al., 2018: Tian, S., Yang, G., & Cai, Y. (2018). Detecting adversarial examples through image transformation. AAAI Conference on Artificial Intelligence.
Touvron et al., 2023: Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., … others. (2023). Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
Tramer et al., 2020: Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. Advances in Neural Information Processing Systems (pp. 1633–1645).
Tramer et al., 2018: Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., & McDaniel, P. (2018). Ensemble adversarial training: attacks and defenses. International Conference on Learning Representations.
Tramer et al., 2016: Tramèr, F., Zhang, F., Juels, A., Reiter, M. K., & Ristenpart, T. (2016). Stealing machine learning models via prediction $\$APIs$\$. USENIX Security Symposium (pp. 601–618).
Tran et al., 2018: Tran, B., Li, J., & Madry, A. (2018). Spectral signatures in backdoor attacks. Advances in Neural Information Processing Systems.
Tu et al., 2019: Tu, C.-C., Ting, P., Chen, P.-Y., Liu, S., Zhang, H., Yi, J., … Cheng, S.-M. (2019). Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks. AAAI Conference on Artificial Intelligence (pp. 742–749).
Turner et al., 2018: Turner, A., Tsipras, D., & Madry, A. (2018). Clean-label backdoor attacks.
Uchida et al., 2017: Uchida, Y., Nagai, Y., Sakazawa, S., & Satoh, Shin'ichi. (2017). Embedding watermarks into deep neural networks. ACM on International Conference on Multimedia Retrieval.
Vaswani et al., 2017: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
Wang & Gong, 2018: Wang, B., & Gong, N. Z. (2018). Stealing hyperparameters in machine learning. IEEE Symposium on Security and Privacy (pp. 36–52).
Wang et al., 2019a: Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., & Zhao, B. Y. (2019). Neural cleanse: identifying and mitigating backdoor attacks in neural networks. IEEE Symposium on Security and Privacy (pp. 707–723).
Wang et al., 2017: Wang, D., Ye, M., & Xu, J. (2017). Differentially private empirical risk minimization revisited: faster and more general. Advances in Neural Information Processing Systems.
Wang et al., 2020a: Wang, H., Sreenivasan, K., Rajput, S., Vishwakarma, H., Agarwal, S., Sohn, J.-y., … Papailiopoulos, D. (2020). Attack of the tails: yes, you really can backdoor federated learning. Advances in Neural Information Processing Systems (pp. 16070–16084).
Wang et al., 2024a: Wang, R., Ma, X., Zhou, H., Ji, C., Ye, G., & Jiang, Y.-G. (2024). White-box multimodal jailbreaks against large vision-language models. arXiv:2405.17894.
Wang et al., 2022: Wang, S., Nepal, S., Abuadbba, A., Rudolph, C., & Grobler, M. (2022). Adversarial detection by latent style transformations. IEEE Transactions on Information Forensics and Security, 17, 1099–1114.
Wang et al., 2020b: Wang, S., Nepal, S., Rudolph, C., Grobler, M., Chen, S., & Chen, T. (2020). Backdoor attacks against transfer learning with pre-trained deep learning models. IEEE Transactions on Services Computing.
Wang et al., 2023a: Wang, X., Ji, Z., Ma, P., Li, Z., & Wang, S. (2023). Instructta: instruction-tuned targeted attack for large vision-language models. arXiv:2312.01886.
Wang et al., 2019b: Wang, Y., Ma, X., Bailey, J., Yi, J., Zhou, B., & Gu, Q. (2019). On the convergence and robustness of adversarial training. International Conference on Machine Learning (pp. 6586–6595).
Wang et al., 2019c: Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., & Gu, Q. (2019). Improving adversarial robustness requires revisiting misclassified examples. International Conference on Learning Representations.
Wang et al., 2023b: Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., & Yan, S. (2023). Better diffusion models further improve adversarial training. International Conference on Machine Learning.
Wang et al., 2024b: Wang, Z., Li, X., Zhu, H., & Xie, C. (2024). Revisiting adversarial training at scale. arXiv:2401.04727.
Wang et al., 2004: Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Webster, 2023: Webster, R. (2023). A reproducible extraction of training images from diffusion models. arXiv preprint arXiv:2305.08694.
Wei et al., 2021: Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., … Le, Q. V. (2021). Finetuned language models are zero-shot learners. International Conference on Machine Learning.
Wei et al., 2022a: Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., … others. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems.
Wei & Zou, 2019: Wei, J., & Zou, K. (2019). Eda: easy data augmentation techniques for boosting performance on text classification tasks. Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing.
Wei et al., 2022b: Wei, Z., Chen, J., Goldblum, M., Wu, Z., Goldstein, T., & Jiang, Y.-G. (2022). Towards transferable adversarial attacks on vision transformers. AAAI Conference on Artificial Intelligence (pp. 2668–2676).
Wen et al., 2024: Wen, Y., Liu, Y., Chen, C., & Lyu, L. (2024). Detecting, explaining, and mitigating memorization in diffusion models. International Conference on Learning Representations.
Williams & Peng, 1990: Williams, R. J., & Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 2, 490–501.
Williams & Zipser, 2013: Williams, R. J., & Zipser, D. (2013). Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropagation (pp. 433–486). Psychology Press.
Wong et al., 2020: Wong, E., Rice, L., & Kolter, J. Z. (2020). Fast is better than free: revisiting adversarial training. International Conference on Learning Representations.
Wu et al., 2023a: Wu, C., Yin, S., Qi, W., Wang, X., Tang, Z., & Duan, N. (2023). Visual chatgpt: talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671.
Wu & Wang, 2021: Wu, D., & Wang, Y. (2021). Adversarial neuron pruning purifies backdoored deep models. Advances in Neural Information Processing Systems (pp. 16913–16925).
Wu et al., 2020a: Wu, D., Wang, Y., Xia, S.-T., Bailey, J., & Ma, X. (2020). Skip connections matter: on the transferability of adversarial examples generated with resnets. International Conference on Learning Representations.
Wu et al., 2020b: Wu, D., Xia, S.-T., & Wang, Y. (2020). Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems (pp. 2958–2969).
Wu et al., 2023b: Wu, S., Ma, C., Wei, K., Xu, X., Ding, M., Qian, Y., & Xiang, T. (2023). Refine, discriminate and align: stealing encoders via sample-wise prototypes and multi-relational extraction. arXiv preprint arXiv:2312.00855.
Xi et al., 2024: Xi, Z., Du, T., Li, C., Pang, R., Ji, S., Chen, J., … Wang, T. (2024). Defending pre-trained language models as few-shot learners against backdoor attacks. Advances in Neural Information Processing Systems.
Xiang et al., 2024: Xiang, Z., Jiang, F., Xiong, Z., Ramasubramanian, B., Poovendran, R., & Li, B. (2024). Badchain: backdoor chain-of-thought prompting for large language models. arXiv preprint arXiv:2401.12242.
Xiao et al., 2018: Xiao, C., Li, B., Zhu, J. Y., He, W., Liu, M., & Song, D. (2018). Generating adversarial examples with adversarial networks. International Joint Conference on Artificial Intelligence (pp. 3905–3911).
Xie et al., 2019a: Xie, C., Huang, K., Chen, P.-Y., & Li, B. (2019). Dba: distributed backdoor attacks against federated learning. International Conference on Learning Representations.
Xie et al., 2020: Xie, C., Tan, M., Gong, B., Yuille, A., & Le, Q. V. (2020). Smooth adversarial training.
Xie et al., 2018: Xie, C., Wang, J., Zhang, Z., Ren, Z., & Yuille, A. (2018). Mitigating adversarial effects through randomization. International Conference on Learning Representations.
Xie et al., 2019b: Xie, C., Wu, Y., Maaten, L. v. d., Yuille, A. L., & He, K. (2019). Feature denoising for improving adversarial robustness. IEEE Conference on Computer Vision and Pattern Recognition (pp. 501–509).
Xie et al., 2019c: Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., & Yuille, A. L. (2019). Improving transferability of adversarial examples with input diversity. IEEE Conference on Computer Vision and Pattern Recognition (pp. 2730–2739).
Xu et al., 2020: Xu, K., Zhang, G., Liu, S., Fan, Q., Sun, M., Chen, H., … Lin, X. (2020). Adversarial t-shirt! evading person detectors in a physical world. European Conference on Computer Vision (pp. 665–681).
Xu et al., 2018: Xu, W., Evans, D., & Qi, Y. (2018). Feature squeezing: detecting adversarial examples in deep neural networks. Network and Distributed Systems Security Symposium.
Xu et al., 2023: Xu, X., Zhang, J., & Kankanhalli, M. (2023). Autolora: a parameter-free automated robust fine-tuning framework. arXiv preprint arXiv:2310.01818.
Yan et al., 2024: Yan, J., Yadav, V., Li, S., Chen, L., Tang, Z., Wang, H., … Jin, H. (2024). Backdooring instruction-tuned large language models with virtual prompt injection. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Yang et al., 2017: Yang, C., Wu, Q., Li, H., & Chen, Y. (2017). Generative poisoning attack method against neural networks.
Yang et al., 2020: Yang, H., Zhang, J., Dong, H., Inkawhich, N., Gardner, A., Touchet, A., … Li, H. (2020). Dverge: diversifying vulnerabilities for enhanced robust generation of ensembles. Advances in Neural Information Processing Systems (pp. 5505–5515).
Yang et al., 2019a: Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology, 10, 1–19.
Yang et al., 2023a: Yang, W., Gao, J., & Mirzasoleiman, B. (2023). Better safe than sorry: pre-training clip against targeted data poisoning and backdoor attacks. arXiv preprint arXiv:2310.05862.
Yang et al., 2023b: Yang, W., Gao, J., & Mirzasoleiman, B. (2023). Robust contrastive language-image pretraining against data poisoning and backdoor attacks. Advances in Neural Information Processing Systems.
Yang et al., 2023c: Yang, Y., Gao, R., Wang, X., Xu, N., & Xu, Q. (2023). Mma-diffusion: multimodal attack on diffusion models. arXiv:2311.17516.
Yang et al., 2022: Yang, Y., Liu, T. Y., & Mirzasoleiman, B. (2022). Not all poisons are created equal: robust training against data poisoning. International Conference on Machine Learning (pp. 25154–25165).
Yang et al., 2019b: Yang, Z., Chang, E.-C., & Liang, Z. (2019). Adversarial neural network inversion via auxiliary knowledge alignment. arXiv preprint arXiv:1902.08552.
Yang et al., 2023d: Yang, Z., He, X., Li, Z., Backes, M., Humbert, M., Berrang, P., & Zhang, Y. (2023). Data poisoning attacks against multimodal encoders. International Conference on Machine Learning.
Yao et al., 2019: Yao, Y., Li, H., Zheng, H., & Zhao, B. Y. (2019). Latent backdoor attacks on deep neural networks. ACM SIGSAC Conference on Computer and Communications Security (pp. 2041–2055).
Ye et al., 2021: Ye, D., Lin, Y., Huang, Y., & Sun, M. (2021). Tr-bert: dynamic token reduction for accelerating bert inference. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Yeom et al., 2018: Yeom, S., Giacomelli, I., Fredrikson, M., & Jha, S. (2018). Privacy risk in machine learning: analyzing the connection to overfitting. IEEE Computer Security Foundations Workshop.
Yin et al., 2018: Yin, D., Chen, Y., Kannan, R., & Bartlett, P. (2018). Byzantine-robust distributed learning: towards optimal statistical rates. International Conference on Machine Learning.
Yin et al., 2020: Yin, H., Molchanov, P., Alvarez, J. M., Li, Z., Mallya, A., Hoiem, D., … Kautz, J. (2020). Dreaming to distill: data-free knowledge transfer via deepinversion. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Yu et al., 2018: Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., & Sang, N. (2018). Bisenet: bilateral segmentation network for real-time semantic segmentation. European Conference on Computer Vision.
Yu et al., 2020: Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., & Finn, C. (2020). Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems.
Yuan et al., 2023: Yuan, Z., Zhou, P., Zou, K., & Cheng, Y. (2023). You are catching my attention: are vision transformers bad learners under backdoor attacks? IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 24605–24615).
Zhai et al., 2023: Zhai, S., Dong, Y., Shen, Q., Pu, S., Fang, Y., & Su, H. (2023). Text-to-image diffusion models can be easily backdoored through multimodal data poisoning. ACM International Conference on Multimedia.
Zhang et al., 2019a: Zhang, D., Zhang, T., Lu, Y., Zhu, Z., & Dong, B. (2019). You only propagate once: accelerating adversarial training via maximal principle. Advances in Neural Information Processing Systems.
Zhang et al., 2019b: Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019). Theoretically principled trade-off between robustness and accuracy. International Conference on Machine Learning (pp. 7472–7482).
Zhang et al., 2024a: Zhang, J., Wang, Z., Wang, R., Ma, X., & Jiang, Y.-G. (2024). Enja: ensemble jailbreak on large language models. arXiv preprint arXiv:2408.03603.
Zhang et al., 2018: Zhang, J., Gu, Z., Jang, J., Wu, H., Stoecklin, M. P., Huang, H., & Molloy, I. (2018). Protecting intellectual property of deep neural networks with watermarking. ACM Asia Conference on Computer and Communications Security.
Zhang et al., 2024b: Zhang, J., Ma, X., Wang, X., Qiu, L., Wang, J., Jiang, Y.-G., & Sang, J. (2024). Adversarial prompt tuning for vision-language models. European Conference on Computer Vision.
Zhang et al., 2022a: Zhang, J., Yi, Q., & Sang, J. (2022). Towards adversarial attack on vision-language pre-training models. ACM International Conference on Multimedia (pp. 5005–5013).
Zhang et al., 2017: Zhang, J., Zheng, K., Mou, W., & Wang, L. (2017). Efficient private ERM for smooth objectives.
Zhang et al., 2020a: Zhang, J., Chen, D., Liao, J., Fang, H., Zhang, W., Zhou, W., … Yu, N. (2020). Model watermarking for image processing networks. AAAI Conference on Artificial Intelligence.
Zhang et al., 2021: Zhang, J., Chen, D., Liao, J., Zhang, W., Feng, H., Hua, G., & Yu, N. (2021). Deep model intellectual property protection via deep watermarking. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zhang et al., 2020b: Zhang, J., Xu, X., Han, B., Niu, G., Cui, L., Sugiyama, M., & Kankanhalli, M. (2020). Attacks which do not kill training make adversarial learning stronger. International Conference on Machine Learning (pp. 11278–11287).
Zhang et al., 2020c: Zhang, J., Zhu, J., Niu, G., Han, B., Sugiyama, M., & Kankanhalli, M. (2020). Geometry-aware instance-reweighted adversarial training. International Conference on Learning Representations.
Zhang et al., 2024c: Zhang, J., Liu, H., Jia, J., & Gong, N. Z. (2024). Data poisoning based backdoor attacks to contrastive learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Zhang et al., 2024d: Zhang, M., Yu, N., Wen, R., Backes, M., & Zhang, Y. (2024). Generated distributions are all you need for membership inference attacks against generative models. IEEE/CVF Winter Conference on Applications of Computer Vision.
Zhang et al., 2022b: Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., … Li, H. (2022). Tip-adapter: training-free adaption of clip for few-shot classification. European Conference on Computer Vision.
Zhang et al., 2023: Zhang, S., Zhang, M., Pan, X., & Yang, M. (2023). No-skim: towards efficiency robustness evaluation on skimming-based language models. arXiv preprint arXiv:2312.09494.
Zhang et al., 2020d: Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). Bertscore: evaluating text generation with bert. International Conference on Learning Representations.
Zhao et al., 2021: Zhao, H., Wei, T., Zhou, W., Zhang, W., Chen, D., & Yu, N. (2021). Multi-attentional deepfake detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Zhao et al., 2024: Zhao, Y., Pang, T., Du, C., Yang, X., Li, C., Cheung, N.-M. M., & Lin, M. (2024). On evaluating adversarial robustness of large vision-language models. Advances in Neural Information Processing Systems.
Zheng et al., 2023: Zheng, M., Lou, Q., & Jiang, L. (2023). Trojvit: trojan insertion in vision transformers. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4025–4034).
Zhou et al., 2024a: Zhou, A., Li, B., & Wang, H. (2024). Robust prompt optimization for defending language models against jailbreaking attacks. arXiv preprint arXiv:2401.17263.
Zhou et al., 2024b: Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., … others. (2024). Lima: less is more for alignment. Advances in Neural Information Processing Systems.
Zhou et al., 2023a: Zhou, Z., Hu, S., Li, M., Zhang, H., Zhang, Y., & Jin, H. (2023). Advclip: downstream-agnostic adversarial examples in multimodal contrastive learning. ACM International Conference on Multimedia.
Zhou et al., 2023b: Zhou, Z., Hu, S., Zhao, R., Wang, Q., Zhang, L. Y., Hou, J., & Jin, H. (2023). Downstream-agnostic adversarial examples. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4345–4355).
Zhu et al., 2020: Zhu, C., Cheng, Y., Gan, Z., Sun, S., Goldstein, T., & Liu, J. (2020). Freelb: enhanced adversarial training for natural language understanding. International Conference on Learning Representations.
Zhu et al., 2019: Zhu, C., Huang, W. R., Li, H., Taylor, G., Studer, C., & Goldstein, T. (2019). Transferable clean-label poisoning attacks on deep neural nets. International Conference on Machine Learning (pp. 7614–7623).
Zhu et al., 2023: Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). Minigpt-4: enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592.
Zhu et al., 2021: Zhu, J., Yao, J., Han, B., Zhang, J., Liu, T., Niu, G., … Yang, H. (2021). Reliable adversarial distillation with unreliable teachers. International Conference on Learning Representations.
Zhu et al., 2024: Zhu, L., Ning, R., Li, J., Xin, C., & Wu, H. (2024). Seer: backdoor detection for vision-language models through searching target text and image trigger jointly. AAAI Conference on Artificial Intelligence.
Zhuang et al., 2023: Zhuang, H., Zhang, Y., & Liu, S. (2023). A pilot study of query-free adversarial attack against stable diffusion. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2384–2391).
Zi et al., 2021: Zi, B., Zhao, S., Ma, X., & Jiang, Y.-G. (2021). Revisiting adversarial robustness distillation: robust soft labels make student better. International Conference on Computer Vision.
Zou et al., 2023: Zou, A., Wang, Z., Kolter, J. Z., & Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043.
, 2023: 张奇、桂韬、黄萱菁. (2023). 自然语言处理导论. 上海: 电子工业出版社.