参考文献¶
- Abadi et al., 2016
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. ACM SIGSAC Conference on Computer and Communications Security.
- Abnar & Zuidema, 2020
Abnar, S., & Zuidema, W. (2020). Quantifying attention flow in transformers. Annual Meeting of the Association for Computational Linguistics.
- Achiam et al., 2023
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., … others. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Adi et al., 2018
Adi, Y., Baum, C., Cisse, M., Pinkas, B., & Keshet, J. (2018). Turning your weakness into a strength: watermarking deep neural networks by backdooring. USENIX Security Symposium.
- Alayrac et al., 2019
Alayrac, J.-B., Uesato, J., Huang, P.-S., Fawzi, A., Stanforth, R., & Kohli, P. (2019). Are labels required for improving adversarial robustness? Advances in Neural Information Processing Systems.
- An et al., 2024
An, S., Chou, S.-Y., Zhang, K., Xu, Q., Tao, G., Shen, G., … others. (2024). Elijah: eliminating backdoors injected in diffusion models via distribution shift. AAAI Conference on Artificial Intelligence.
- Anderberg et al., 2024
Anderberg, A., Bailey, J., Campello, R. J., Houle, M. E., Marques, H. O., Radovanović, M., & Zimek, A. (2024). Dimensionality-aware outlier detection: theoretical and experimental analysis. SIAM International Conference on Data Mining.
- Andreina et al., 2021
Andreina, S., Marson, G. A., Möllering, H., & Karame, G. (2021). Baffle: backdoor detection via feedback-based federated learning. International Conference on Distributed Computing Systems.
- Andriushchenko et al., 2020
Andriushchenko, M., Croce, F., Flammarion, N., & Hein, M. (2020). Square attack: a query-efficient black-box adversarial attack via random search. European Conference on Computer Vision (pp. 484–501).
- Athalye et al., 2018a
Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. International Conference on Machine Learning (pp. 274–283).
- Athalye et al., 2018b
Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2018). Synthesizing robust adversarial examples. International Conference on Machine Learning (pp. 284–293).
- Bagdasaryan et al., 2020
Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V. (2020). How to backdoor federated learning. International Conference on Artificial Intelligence and Statistics (pp. 2938–2948).
- Bai et al., 2024
Bai, Y., Pei, G., Gu, J., Yang, Y., & Ma, X. (2024). Special characters attack: toward scalable training data extraction from large language models. arXiv preprint arXiv:2405.05990.
- Bai et al., 2020
Bai, Y., Zeng, Y., Jiang, Y., Xia, S.-T., Ma, X., & Wang, Y. (2020). Improving adversarial robustness via channel-wise activation suppressing. International Conference on Learning Representations.
- Bai et al., 2022a
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., … others. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- Bai et al., 2022b
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., … others. (2022). Constitutional ai: harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
- Ban & Dong, 2022
Ban, Y., & Dong, Y. (2022). Pre-trained adversarial perturbations. Advances in Neural Information Processing Systems.
- Bansal et al., 2023
Bansal, H., Singhi, N., Yang, Y., Yin, F., Grover, A., & Chang, K.-W. (2023). Cleanclip: mitigating data poisoning attacks in multimodal contrastive learning. IEEE/CVF International Conference on Computer Vision.
- Bao et al., 2023
Bao, F., Nie, S., Xue, K., Li, C., Pu, S., Wang, Y., … Zhu, J. (2023). One transformer fits all distributions in multi-modal diffusion at scale. International Conference on Machine Learning (pp. 1692–1717).
- Barreno et al., 2006
Barreno, M., Nelson, B., Sears, R., Joseph, A. D., & Tygar, J. D. (2006). Can machine learning be secure? ACM Symposium on Information, Computer and Communications Security (pp. 16–25).
- Bendale & Boult, 2016
Bendale, A., & Boult, T. E. (2016). Towards open set deep networks. IEEE Conference on Computer Vision and Pattern Recognition (pp. 1563–1572).
- Biggio et al., 2013
Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., … Roli, F. (2013). Evasion attacks against machine learning at test time. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 387–402).
- Biggio et al., 2012
Biggio, B., Nelson, B., & Laskov, P. (2012). Poisoning attacks against support vector machines. International Conference on International Conference on Machine Learning (pp. 1467–1474). Madison, WI, USA: Omnipress.
- Blanchard et al., 2017
Blanchard, P., El Mhamdi, E. M., Guerraoui, R., & Stainer, J. (2017). Machine learning with adversaries: byzantine tolerant gradient descent. Advances in Neural Information Processing Systems.
- Brendel et al., 2018
Brendel, W., Rauber, J., & Bethge, M. (2018). Decision-based adversarial attacks: reliable attacks against black-box machine learning models. International Conference on Learning Representations.
- Brown et al., 2020
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … others. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems.
- Brown et al., 2017
Brown, T. B., Mané, D., Roy, A., Abadi, M., & Gilmer, J. (2017). Adversarial patch.
- Cai et al., 2018
Cai, Q.-Z., Liu, C., & Song, D. (2018). Curriculum adversarial training. International Joint Conference on Artificial Intelligence (pp. 3740–3747).
- Cao et al., 2021a
Cao, X., Jia, J., & Gong, N. Z. (2021). Ipguard: protecting intellectual property of deep neural networks via fingerprinting the classification boundary. ACM Asia Conference on Computer and Communications Security.
- Cao et al., 2021b
Cao, Y., Wang, N., Xiao, C., Yang, D., Fang, J., Yang, R., … Li, B. (2021). Invisible for both camera and lidar: security of multi-sensor fusion based perception in autonomous driving under physical-world attacks. IEEE Symposium on Security and Privacy.
- Carlini et al., 2022
Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., & Tramer, F. (2022). Membership inference attacks from first principles. IEEE Symposium on Security and Privacy.
- Carlini et al., 2023a
Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., & Zhang, C. (2023). Quantifying memorization across neural language models. International Conference on Learning Representations.
- Carlini et al., 2023b
Carlini, N., Nasr, M., Choquette-Choo, C. A., Jagielski, M., Gao, I., Awadalla, A., … others. (2023). Are aligned neural networks adversarially aligned? arXiv:2306.15447.
- Carlini & Terzis, 2021
Carlini, N., & Terzis, A. (2021). Poisoning and backdooring contrastive learning. arXiv preprint arXiv:2106.09667.
- Carlini et al., 2021
Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., … others. (2021). Extracting training data from large language models. USENIX Security Symposium (pp. 2633–2650).
- Carlini & Wagner, 2017a
Carlini, N., & Wagner, D. (2017). Adversarial examples are not easily detected: bypassing ten detection methods. ACM Workshop on Artificial Intelligence and Security (pp. 3–14).
- Carlini & Wagner, 2017b
Carlini, N., & Wagner, D. (2017). Magnet and" efficient defenses against adversarial attacks" are not robust to adversarial examples. arXiv preprint arXiv:1711.08478.
- Carlini & Wagner, 2017c
Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. IEEE Symposium on Security and Privacy (pp. 39–57).
- Carlini et al., 2023c
Carlini, N., Hayes, J., Nasr, M., Jagielski, M., Sehwag, V., Tramer, F., … Wallace, E. (2023). Extracting training data from diffusion models. USENIX Security Symposium.
- Carmon et al., 2019
Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J. C., & Liang, P. S. (2019). Unlabeled data improves adversarial robustness. Advances in Neural Information Processing Systems.
- Chan et al., 2022
Chan, S.-H., Dong, Y., Zhu, J., Zhang, X., & Zhou, J. (2022). BadDet: Backdoor Attacks on Object Detection.
- Chang et al., 2000
Chang, S. G., Yu, B., & Vetterli, M. (2000). Adaptive wavelet thresholding for image denoising and compression. IEEE Transactions on Image Processing, 9(9), 1532–1546.
- Chao et al., 2023
Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., & Wong, E. (2023). Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:2310.08419.
- Chaudhuri & Monteleoni, 2008
Chaudhuri, K., & Monteleoni, C. (2008). Privacy-preserving logistic regression. Advances in Neural Information Processing Systems.
- Chen et al., 2018
Chen, B., Carvalho, W., Baracaldo, N., Ludwig, H., Edwards, B., Lee, T., … Srivastava, B. (2018). Detecting backdoor attacks on deep neural networks by activation clustering.
- Chen et al., 2019
Chen, H., Fu, C., Zhao, J., & Koushanfar, F. (2019). Deepinspect: a black-box trojan detection and mitigation framework for deep neural networks. International Joint Conference on Artificial Intelligence (pp. 4658–4664).
- Chen et al., 2022a
Chen, J., Wang, J., Peng, T., Sun, Y., Cheng, P., Ji, S., … Song, D. (2022). Copy, right? a testing framework for copyright protection of deep learning models. IEEE Symposium on Security and Privacy.
- Chen et al., 2017a
Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., & Hsieh, C.-J. (2017). Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. ACM Workshop on Artificial Intelligence and Security (pp. 15–26).
- Chen et al., 2022b
Chen, S., Liu, C., Haque, M., Song, Z., & Yang, W. (2022). Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 1148–1160).
- Chen et al., 2021
Chen, T., Zhang, Z., Liu, S., Chang, S., & Wang, Z. (2021). Robust overfitting may be mitigated by properly learned smoothening. International Conference on Learning Representations.
- Chen et al., 2020
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. International Conference on Machine Learning.
- Chen et al., 2023
Chen, W., Song, D., & Li, B. (2023). Trojdiff: trojan attacks on diffusion models with diverse targets. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Chen et al., 2017b
Chen, X., Liu, C., Li, B., Lu, K., & Song, D. (2017). Targeted backdoor attacks on deep learning systems using data poisoning.
- Chen et al., 2024
Chen, Y., Ma, X., Zou, D., & Jiang, Y.-G. (2024). Extracting training data from unconditional diffusion models. arXiv preprint arXiv:2406.12752.
- Cheng et al., 2019
Cheng, M., Le, T., Chen, P.-Y., Zhang, H., Yi, J., & Hsieh, C.-J. (2019). Query-efficient hard-label black-box attack: an optimization-based approach. International Conference on Learning Representation.
- Chiang et al., 2023
Chiang, W.-L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang, H., … others. (2023). Vicuna: an open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
- Chou et al., 2023
Chou, S.-Y., Chen, P.-Y., & Ho, T.-Y. (2023). How to backdoor diffusion models? IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Chowdhery et al., 2023
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., … others. (2023). Palm: scaling language modeling with pathways. Journal of Machine Learning Research, 24(240), 1–113.
- Christiano et al., 2017
Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems.
- Clevert et al., 2016
Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network learning by exponential linear units (elus). International Conference on Learning Representations.
- Cordts et al., 2016
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., … Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Croce & Hein, 2020a
Croce, F., & Hein, M. (2020). Minimally distorted adversarial examples with a fast adaptive boundary attack. International Conference on Machine Learning (pp. 2196–2205).
- Croce & Hein, 2020b
Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. International Conference on Machine Learning (pp. 2206–2216).
- Crowson, 2022
Crowson, K. (2022). K-Diffusion.
- Dai et al., 2023
Dai, W., Li, J., Li, D., Tiong, A. M. H., Zhao, J., Wang, W., … Hoi, S. (2023). InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.
- DarvishRouhani et al., 2019
Darvish Rouhani, B., Chen, H., & Koushanfar, F. (2019). Deepsigns: an end-to-end watermarking framework for ownership protection of deep neural networks. International Conference on Architectural Support for Programming Languages and Operating Systems.
- Das et al., 2017
Das, N., Shanbhogue, M., Chen, S.-T., Hohman, F., Chen, L., Kounavis, M. E., & Chau, D. H. (2017). Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression.
- Deng et al., 2009
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition.
- Devlin et al., 2018
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Ding et al., 2019
Ding, G. W., Sharma, Y., Lui, K. Y. C., & Huang, R. (2019). Mma training: direct input space margin maximization through adversarial training. International Conference on Learning Representations.
- Doan et al., 2023
Doan, K. D., Lao, Y., Yang, P., & Li, P. (2023). Defending backdoor attacks on vision transformer via patch processing. AAAI Conference on Artificial Intelligence.
- Dong et al., 2020
Dong, Y., Deng, Z., Pang, T., Zhu, J., & Su, H. (2020). Adversarial distributional training for robust deep learning. Advances in Neural Information Processing Systems (pp. 8270–8283).
- Dong et al., 2018
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., & Li, J. (2018). Boosting adversarial attacks with momentum. IEEE Conference on Computer Vision and Pattern Recognition (pp. 9185–9193).
- Dosovitskiy et al., 2021
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … others. (2021). An image is worth 16x16 words: transformers for image recognition at scale. International Conference on Learning Representations.
- Duan et al., 2023
Duan, J., Kong, F., Wang, S., Shi, X., & Xu, K. (2023). Are diffusion models vulnerable to membership inference attacks? International Conference on Machine Learning.
- Duan et al., 2020
Duan, R., Ma, X., Wang, Y., Bailey, J., Qin, A. K., & Yang, Y. (2020). Adversarial camouflage: hiding physical-world attacks with natural styles. IEEE Conference on Computer Vision and Pattern Recognition (pp. 1000–1008).
- Dwork et al., 2006
Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference.
- Ebrahimi et al., 2018
Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2018). Hotflip: white-box adversarial examples for text classification. Annual Meeting of the Association for Computational Linguistics (pp. 31–36).
- Elman, 1990
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.
- Eykholt et al., 2018
Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., … Song, D. (2018). Robust physical-world attacks on deep learning visual classification. IEEE Conference on Computer Vision and Pattern Recognition.
- Feinman et al., 2017
Feinman, R., Curtin, R. R., Shintre, S., & Gardner, A. B. (2017). Detecting adversarial samples from artifacts.
- Feng et al., 2019
Feng, J., Cai, Q.-Z., & Zhou, Z.-H. (2019). Learning to confuse: generating training time adversarial data with auto-encoder. Advances in Neural Information Processing Systems.
- Fredrikson et al., 2015
Fredrikson, M., Jha, S., & Ristenpart, T. (2015). Model inversion attacks that exploit confidence information and basic countermeasures. ACM SIGSAC Conference on Computer and Communications Security (pp. 1322–1333).
- Frosst et al., 2019
Frosst, N., Papernot, N., & Hinton, G. (2019). Analyzing and improving representations with the soft nearest neighbor loss. International Conference on Machine Learning.
- Fu et al., 2022
Fu, Y., Zhang, S., Wu, S., Wan, C., & Lin, Y. (2022). Patch-fool: are vision transformers always robust against adversarial perturbations? International Conference on Learning Representations.
- Fung et al., 2018
Fung, C., Yoon, C. J., & Beschastnikh, I. (2018). Mitigating sybils in federated learning poisoning.
- Gailly & Adler, 2004
Gailly, J.-l., & Adler, M. (2004). Zlib compression library.
- Gal & Ghahramani, 2016
Gal, Y., & Ghahramani, Z. (2016). A theoretically grounded application of dropout in recurrent neural networks. Advances in Neural Information Processing Systems.
- Gan et al., 2020
Gan, Z., Chen, Y.-C., Li, L., Zhu, C., Cheng, Y., & Liu, J. (2020). Large-scale adversarial training for vision-and-language representation learning. Advances in Neural Information Processing Systems (pp. 6616–6628).
- Geiping et al., 2021
Geiping, J., Fowl, L. H., Huang, W. R., Czaja, W., Taylor, G., Moeller, M., & Goldstein, T. (2021). Witches' brew: industrial scale data poisoning via gradient matching. International Conference on Learning Representations.
- Goldblum et al., 2020
Goldblum, M., Fowl, L., Feizi, S., & Goldstein, T. (2020). Adversarially robust distillation. AAAI Conference on Artificial Intelligence (pp. 3996–4003).
- Gong et al., 2023
Gong, Y., Ran, D., Liu, J., Wang, C., Cong, T., Wang, A., … Wang, X. (2023). Figstep: jailbreaking large vision-language models via typographic visual prompts. arXiv:2311.05608.
- Gong et al., 2017
Gong, Z., Wang, W., & Ku, W.-S. (2017). Adversarial and clean data are not twins.
- Goodfellow et al., 2015
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. International Conference on Learning Representations.
- Gowal et al., 2021
Gowal, S., Rebuffi, S.-A., Wiles, O., Stimberg, F., Calian, D. A., & Mann, T. A. (2021). Improving robustness using generated data. Advances in Neural Information Processing Systems.
- Goyal et al., 2020
Goyal, S., Choudhury, A. R., Raje, S. M., Chakaravarthy, V. T., Sabharwal, Y., & Verma, A. (2020). Power-bert: accelerating bert inference via progressive word-vector elimination. International Conference on Machine Learning.
- Greshake et al., 2023
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). More than you've asked for: a comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv e-prints, pp. arXiv–2302.
- Gretton et al., 2012
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13(1), 723–773.
- Grosse et al., 2017
Grosse, K., Manoharan, P., Papernot, N., Backes, M., & McDaniel, P. (2017). On the (statistical) detection of adversarial examples.
- Gu et al., 2022
Gu, J., Tresp, V., & Qin, Y. (2022). Are vision transformers robust to patch perturbations? European Conference on Computer Vision.
- Gu et al., 2017
Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). Badnets: Identifying vulnerabilities in the machine learning model supply chain.
- Gu et al., 2023
Gu, X., Du, C., Pang, T., Li, C., Lin, M., & Wang, Y. (2023). On memorization in diffusion models. arXiv preprint arXiv:2310.02664.
- Guan et al., 2022
Guan, Y., Li, Z., Leng, J., Lin, Z., & Guo, M. (2022). Transkimmer: transformer learns to layer-wise skim. Annual Meeting of the Association for Computational Linguistics (pp. 7275–7286).
- Guan et al., 2024
Guan, Z., Hu, M., Li, S., & Vullikanti, A. (2024). Ufid: a unified framework for input-level backdoor detection on diffusion models. arXiv preprint arXiv:2404.01101.
- Guo et al., 2017
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. International Conference on Machine Learning.
- Guo et al., 2023
Guo, J., Li, J., Li, D., Tiong, A. M. H., Li, B., Tao, D., & Hoi, S. (2023). From images to textual prompts: zero-shot visual question answering with frozen large language models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10867–10877).
- Guo et al., 2019
Guo, W., Wang, L., Xing, X., Du, M., & Song, D. (2019). Tabor: A highly accurate approach to inspecting and restoring trojan backdoors in ai systems.
- Gupta & Rahtu, 2019
Gupta, P., & Rahtu, E. (2019). Ciidefence: defeating adversarial attacks by fusing class-specific image inpainting and image denoising. IEEE International Conference on Computer Vision (pp. 6708–6717).
- Hampel, 1974
Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69(346), 383–393.
- Hao et al., 2024
Hao, Y., Yang, W., & Lin, Y. (2024). Exploring backdoor vulnerabilities of chat models. arXiv preprint arXiv:2404.02406.
- He et al., 2022a
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- He et al., 2016
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- He et al., 2022b
He, X., Xu, Q., Lyu, L., Wu, F., & Wang, C. (2022). Protecting intellectual property of language generation apis with lexical watermark. AAAI Conference on Artificial Intelligence.
- He et al., 2019
He, Z., Zhang, T., & Lee, R. B. (2019). Model inversion attacks against collaborative inference. Annual Computer Security Applications Conference.
- Hendrycks & Gimpel, 2016a
Hendrycks, D., & Gimpel, K. (2016). Early methods for detecting adversarial images.
- Hendrycks & Gimpel, 2016b
Hendrycks, D., & Gimpel, K. (2016). Gaussian error linear units (gelus).
- Hintersdorf et al., 2024
Hintersdorf, D., Struppek, L., Kersting, K., Dziedzic, A., & Boenisch, F. (2024). Finding nemo: localizing neurons responsible for memorization in diffusion models. arXiv preprint arXiv:2406.02366.
- Hinton et al., 2015
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- Ho et al., 2020
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems.
- Houle, 2017
Houle, M. E. (2017). Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. International Conference on Similarity Search and Applications.
- Hu et al., 2022
Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., … Chen, W. (2022). LoRA: low-rank adaptation of large language models. International Conference on Learning Representations.
- Hu et al., 2019
Hu, S., Yu, T., Guo, C., Chao, W.-L., & Weinberger, K. Q. (2019). A new defense against adversarial images: turning a weakness into a strength. Advances in Neural Information Processing Systems.
- Hua et al., 2024
Hua, A., Gu, J., Xue, Z., Carlini, N., Wong, E., & Qin, Y. (2024). Initialization matters for adversarial transfer learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 24831–24840).
- Huang et al., 2023a
Huang, B., Wang, Z., Yang, J., Ai, J., Zou, Q., Wang, Q., & Ye, D. (2023). Implicit identity driven deepfake face swapping detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Huang et al., 2023b
Huang, H., Ma, X., Erfani, S., & Bailey, J. (2023). Distilling cognitive backdoor patterns within an image. International Conference on Learning Representations.
- Huang et al., 2020
Huang, W. R., Geiping, J., Fowl, L., Taylor, G., & Goldstein, T. (2020). Metapoison: practical general-purpose clean-label data poisoning. Advances in Neural Information Processing Systems (pp. 12080–12091).
- Ilyas et al., 2018
Ilyas, A., Engstrom, L., Athalye, A., & Lin, J. (2018). Black-box adversarial attacks with limited queries and information. International Conference on Machine Learning (pp. 2137–2146).
- Ishihara, 2023
Ishihara, S. (2023). Training data extraction from pre-trained language models: a survey. arXiv preprint arXiv:2305.16157.
- Izmailov et al., 2018
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., & Wilson, A. G. (2018). Averaging weights leads to wider optima and better generalization. Conference on Uncertainty in Artificial Intelligence.
- Jia et al., 2021
Jia, H., Choquette-Choo, C. A., Chandrasekaran, V., & Papernot, N. (2021). Entangled watermarks as a defense against model extraction. USENIX Security Symposium.
- Jia et al., 2022a
Jia, J., Liu, Y., & Gong, N. Z. (2022). Badencoder: backdoor attacks to pre-trained encoders in self-supervised learning. IEEE Symposium on Security and Privacy.
- Jia et al., 2022b
Jia, M., Tang, L., Chen, B.-C., Cardie, C., Belongie, S., Hariharan, B., & Lim, S.-N. (2022). Visual prompt tuning. European Conference on Computer Vision.
- Jia et al., 2019
Jia, X., Wei, X., Cao, X., & Foroosh, H. (2019). Comdefend: an efficient image compression model to defend adversarial examples. IEEE Conference on Computer Vision and Pattern Recognition (pp. 6084–6092).
- Jiang et al., 2023
Jiang, Y., Chan, C., Chen, M., & Wang, W. (2023). Lion: adversarial distillation of proprietary large language models. Conference on Empirical Methods in Natural Language Processing (pp. 3134–3154).
- Jin et al., 2019
Jin, G., Shen, S., Zhang, D., Dai, F., & Zhang, Y. (2019). Ape-gan: adversarial perturbation elimination with gan. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 3842–3846).
- Kang et al., 2023
Kang, M., Zhu, J.-Y., Zhang, R., Park, J., Shechtman, E., Paris, S., & Park, T. (2023). Scaling up gans for text-to-image synthesis. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Kearns & Li, 1993
Kearns, M., & Li, M. (1993). Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4), 807–837.
- Kim & Cho, 2021
Kim, G., & Cho, K. (2021). Length-adaptive transformer: train once with length drop, use anytime with search. Joint Conference of Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing.
- Kim et al., 2022
Kim, S., Shen, S., Thorsley, D., Gholami, A., Kwon, W., Hassoun, J., & Keutzer, K. (2022). Learned token pruning for transformers. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 784–794).
- Kirchenbauer et al., 2023
Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. International Conference on Machine Learning.
- Koh et al., 2022
Koh, P. W., Steinhardt, J., & Liang, P. (2022). Stronger data poisoning attacks break data sanitization defenses. Machine Learning, 111(1), 1–47.
- Kruger et al., 2004
Kruger, L. E., Wohler, C., Wurz-Wessel, A., & Stein, F. (2004). In-factory calibration of multiocular camera systems. Optical Metrology in Production Engineering.
- Kumar et al., 2020
Kumar, R. S. S., Nyström, M., Lambert, J., Marshall, A., Goertzel, M., Comissoneru, A., … Xia, S. (2020). Adversarial machine learning-industry perspectives. IEEE Security and Privacy Workshops (pp. 69–75).
- Kurakin et al., 2016
Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale.
- Kurakin et al., 2018
Kurakin, A., Goodfellow, I. J., & Bengio, S. (2018). Adversarial examples in the physical world. Artificial Intelligence Safety and Security (pp. 99–112). Chapman and Hall/CRC.
- LeMerrer et al., 2020
Le Merrer, E., Perez, P., & Trédan, G. (2020). Adversarial frontier stitching for remote neural network watermarking. Neural Computing and Applications, 32(13), 9233–9244.
- Lee et al., 2022
Lee, K., Ippolito, D., Nystrom, A., Zhang, C., Eck, D., Callison-Burch, C., & Carlini, N. (2022). Deduplicating training data makes language models better. Annual Meeting of the Association for Computational Linguistics.
- Lee et al., 2018
Lee, K., Lee, K., Lee, H., & Shin, J. (2018). A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in Neural Information Processing Systems.
- Li et al., 2024a
Li, H., Chen, Y., Zheng, Z., Hu, Q., Chan, C., Liu, H., & Song, Y. (2024). Backdoor removal for generative large language models. arXiv preprint arXiv:2405.07667.
- Li et al., 2023a
Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. International conference on machine learning (pp. 19730–19742).
- Li et al., 2022
Li, J., Li, D., Xiong, C., & Hoi, S. (2022). Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation. International conference on machine learning (pp. 12888–12900).
- Li et al., 2021a
Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., & Hoi, S. C. H. (2021). Align before fuse: vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34, 9694–9705.
- Li et al., 2020a
Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020). Face x-ray for more general face forgery detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Li et al., 2020b
Li, L., Ma, R., Guo, Q., Xue, X., & Qiu, X. (2020). Bert-attack: adversarial attack against bert using bert. Conference on Empirical Methods in Natural Language Processing (pp. 6193–6202).
- Li et al., 2024b
Li, Q., Wang, W., Xu, C., Sun, Z., & Yang, M.-H. (2024). Learning disentangled representation for one-shot progressive face swapping. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Li et al., 2020c
Li, S., Cheng, Y., Wang, W., Liu, Y., & Chen, T. (2020). Learning to detect malicious clients for robust federated learning. arXiv preprint arXiv:2002.00211.
- Li et al., 2024c
Li, W., Chen, P.-Y., Liu, S., & Wang, R. (2024). Psbd: prediction shift uncertainty unlocks backdoor detection. arXiv preprint arXiv:2406.05826.
- Li et al., 2021b
Li, Y., Yang, Z., Wang, Y., & Xu, C. (2021). Neural architecture dilation for adversarial robustness. Advances in Neural Information Processing Systems (pp. 29578–29589).
- Li et al., 2021c
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., & Ma, X. (2021). Anti-backdoor learning: training clean models on poisoned data. Advances in Neural Information Processing Systems (pp. 14900–14912).
- Li et al., 2021d
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., & Ma, X. (2021). Anti-backdoor learning: training clean models on poisoned data. Advances in Neural Information Processing Systems.
- Li et al., 2023b
Li, Y., Lyu, X., Ma, X., Koren, N., Lyu, L., Li, B., & Jiang, Y.-G. (2023). Reconstructive neuron pruning for backdoor defense. International Conference on Machine Learning.
- Li et al., 2024d
Li, Y., Ma, X., He, J., Huang, H., & Jiang, Y.-G. (2024). Multi-trigger backdoor attacks: more triggers, more threats. arXiv preprint arXiv:2401.15295.
- Li et al., 2021e
Li, Y., Li, Y., Wu, B., Li, L., He, R., & Lyu, S. (2021). Invisible backdoor attack with sample-specific triggers. IEEE International Conference on Computer Vision (pp. 16463–16472).
- Li et al., 2024e
Li, Z., Wang, C., Ma, P., Liu, C., Wang, S., Wu, D., … Liu, Y. (2024). On extracting specialized code abilities from large language models: a feasibility study. IEEE/ACM International Conference on Software Engineering.
- Liang et al., 2024
Liang, S., Zhu, M., Liu, A., Wu, B., Cao, X., & Chang, E.-C. (2024). Badclip: dual-embedding guided backdoor attack on multimodal contrastive learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Liao et al., 2018
Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., & Zhu, J. (2018). Defense against adversarial attacks using high-level representation guided denoiser. IEEE Conference on Computer Vision and Pattern Recognition (pp. 1778–1787).
- Lin et al., 2014
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., … Zitnick, C. L. (2014). Microsoft coco: common objects in context. European Conference on Computer Vision.
- Liu et al., 2023
Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning. Advances in Neural Information Processing Systems.
- Liu et al., 2024
Liu, H., Reiter, M. K., & Gong, N. Z. (2024). Mudjacking: patching backdoor vulnerabilities in foundation models. arXiv preprint arXiv:2402.14977.
- Liu et al., 2018a
Liu, K., Dolan-Gavitt, B., & Garg, S. (2018). Fine-pruning: defending against backdooring attacks on deep neural networks. International Symposium on Research in Attacks, Intrusions, and Defenses (pp. 273–294).
- Liu et al., 2020
Liu, X., Cheng, H., He, P., Chen, W., Wang, Y., Poon, H., & Gao, J. (2020). Adversarial training for large neural language models. arXiv preprint arXiv:2004.08994.
- Liu et al., 2017
Liu, Y., Chen, X., Liu, C., & Song, D. (2017). Delving into transferable adversarial examples and black-box attacks.
- Liu et al., 2018b
Liu, Y., Ma, S., Aafer, Y., Lee, W.-C., Zhai, J., Wang, W., & Zhang, X. (2018). Trojaning attack on neural networks. Network and Distributed Systems Security Symposium.
- Lorenz et al., 2022
Lorenz, P., Keuper, M., & Keuper, J. (2022). Unfolding local growth rate estimates for (almost) perfect adversarial detection. International Conference on Computer Vision Theory and Applications.
- Lu et al., 2023
Lu, D., Wang, Z., Wang, T., Guan, W., Gao, H., & Zheng, F. (2023). Set-level guidance attack: boosting adversarial transferability of vision-language pre-training models. IEEE/CVF International Conference on Computer Vision (pp. 102–111).
- Lu et al., 2022
Lu, P., Mishra, S., Xia, T., Qiu, L., Chang, K.-W., Zhu, S.-C., … Kalyan, A. (2022). Learn to explain: multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems.
- Lukas et al., 2021
Lukas, N., Zhang, Y., & Kerschbaum, F. (2021). Deep neural network fingerprinting by conferrable adversarial examples.
- Luo et al., 2024
Luo, H., Gu, J., Liu, F., & Torr, P. (2024). An image is worth 1000 lies: adversarial transferability across prompts on vision-language models. arXiv:2403.09766.
- Lv et al., 2021
Lv, P., Ma, H., Zhou, J., Liang, R., Chen, K., Zhang, S., & Yang, Y. (2021). Dbia: data-free backdoor injection attack against transformer networks. arXiv preprint arXiv:2111.11870.
- Ma et al., 2023
Ma, H., Qiu, H., Gao, Y., Zhang, Z., Abuadbba, A., Xue, M., … Abbott, D. (2023). Quantization backdoors to deep learning commercial frameworks. IEEE Transactions on Dependable and Secure Computing.
- Ma et al., 2024
Ma, J., Cao, A., Xiao, Z., Zhang, J., Ye, C., & Zhao, J. (2024). Jailbreaking prompt attack: a controllable adversarial attack against diffusion models. arXiv:2404.02928.
- Ma et al., 2018
Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S., Schoenebeck, G., … Bailey, J. (2018). Characterizing adversarial subspaces using local intrinsic dimensionality. International Conference on Learning Representations.
- Madry et al., 2018
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations.
- Mahalanobis, 1936
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences, 2, 49–55.
- Mahendran & Vedaldi, 2015
Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. IEEE Conference on Computer Vision and Pattern Recognition.
- Mahendran & Vedaldi, 2016
Mahendran, A., & Vedaldi, A. (2016). Visualizing deep convolutional neural networks using natural pre-images. International Journal of Computer Vision, 120, 233–255.
- Mahloujifar & Mahmoody, 2017
Mahloujifar, S., & Mahmoody, M. (2017). Blockwise p-tampering attacks on cryptographic primitives, extractors, and learners. Theory of Cryptography Conference (pp. 245–279).
- Mahloujifar et al., 2019
Mahloujifar, S., Mahmoody, M., & Mohammed, A. (2019). Universal multi-party poisoning attacks. International Conference on Machine Learning (pp. 4274–4283).
- Mahmood et al., 2021
Mahmood, K., Mahmood, R., & Van Dijk, M. (2021). On the robustness of vision transformers to adversarial examples. IEEE International Conference on Computer Vision.
- Mao et al., 2023
Mao, C., Geng, S., Yang, J., Wang, X., & Vondrick, C. (2023). Understanding zero-shot adversarial robustness for large-scale models. International Conference on Learning Representations.
- Masood et al., 2023
Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A., & Malik, H. (2023). Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward. Applied Intelligence, 53(4), 3974–4026.
- Mattern et al., 2023
Mattern, J., Mireshghallah, F., Jin, Z., Schoelkopf, B., Sachan, M., & Berg-Kirkpatrick, T. (2023). Membership inference attacks against language models via neighbourhood comparison. Annual Meeting of The Association For Computational Linguistics.
- McMahan et al., 2017
McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Artificial Intelligence and Statistics.
- Mei & Zhu, 2015
Mei, S., & Zhu, X. (2015). Using machine teaching to identify optimal training-set attacks on machine learners. AAAI Conference on Artificial Intelligence.
- Meng & Chen, 2017
Meng, D., & Chen, H. (2017). Magnet: a two-pronged defense against adversarial examples. ACM SIGSAC Conference on Computer and Communications Security (pp. 135–147).
- Metzen et al., 2017
Metzen, J. H., Genewein, T., Fischer, V., & Bischoff, B. (2017). On detecting adversarial perturbations. International Conference on Learning Representations.
- Micikevicius et al., 2018
Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., … others. (2018). Mixed precision training. International Conference on Learning Representations.
- Miyato et al., 2018
Miyato, T., Maeda, S.-i., Koyama, M., & Ishii, S. (2018). Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1979–1993.
- Mo et al., 2024
Mo, Y., Huang, H., Li, M., Li, A., & Wang, Y. (2024). Terd: a unified framework for safeguarding diffusion models against backdoors. International Conference on Machine Learning.
- Moosavi-Dezfooli et al., 2016
Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016). Deepfool: a simple and accurate method to fool deep neural networks. IEEE Conference on Computer Vision and Pattern Recognition (pp. 2574–2582).
- Mordvintsev et al., 2015
Mordvintsev, A., Olah, C., & Tyka, M. (2015). Inceptionism: going deeper into neural networks.
- Munoz-Gonzalez et al., 2019
Muñoz-González, L., Pfitzner, B., Russo, M., Carnerero-Cano, J., & Lupu, E. C. (2019). Poisoning attacks with generative adversarial nets.
- Nair & Hinton, 2010
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. International Conference on Machine Learning.
- Naseer et al., 2021
Naseer, M., Ranasinghe, K., Khan, S., Khan, F. S., & Porikli, F. (2021). On improving adversarial transferability of vision transformers. arXiv preprint arXiv:2106.04169.
- Naseh et al., 2023
Naseh, A., Roh, J., & Houmansadr, A. (2023). Memory triggers: unveiling memorization in text-to-image generative models through word-level duplication. arXiv preprint arXiv:2312.03692.
- Nasr et al., 2023
Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., … Lee, K. (2023). Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035.
- Nelson et al., 2008
Nelson, B., Barreno, M., Chi, F. J., Joseph, A. D., Rubinstein, B. I., Saini, U., … Xia, K. (2008). Exploiting machine learning to subvert your spam filter. LEET, 8(1), 9.
- Nguyen et al., 2017
Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., & Yosinski, J. (2017). Plug & play generative networks: conditional iterative generation of images in latent space. IEEE Conference on Computer Vision and Pattern Recognition.
- Nguyen et al., 2016
Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., & Clune, J. (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Advances in Neural Information Processing systems, 29.
- Nguyen & Tran, 2020
Nguyen, T. A., & Tran, A. (2020). Input-aware dynamic backdoor attack. Advances in Neural Information Processing Systems (pp. 3454–3464).
- Nie et al., 2022
Nie, W., Guo, B., Huang, Y., Xiao, C., Vahdat, A., & Anandkumar, A. (2022). Diffusion models for adversarial purification. International Conference on Machine Learning (pp. 16805–16827).
- Nirkin et al., 2019
Nirkin, Y., Keller, Y., & Hassner, T. (2019). FSGAN: subject agnostic face swapping and reenactment. IEEE International Conference on Computer Vision.
- Noever & Noever, 2021
Noever, D. A., & Noever, S. E. M. (2021). Reading isn't believing: adversarial attacks on multi-modal neurons. arXiv:2103.10480.
- Oh et al., 2019
Oh, S. J., Schiele, B., & Fritz, M. (2019). Towards reverse-engineering black-box neural networks. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (pp. 121–144). Springer.
- Ooms, 2024
Ooms, J. (2024). cld3: Google's Compact Language Detector 3. R package version 1.6.0. URL: https://docs.ropensci.org/cld3/ https://github.com/ropensci/cld3 https://ropensci.r-universe.dev/cld3
- Oord et al., 2018
Oord, A. v. d., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding.
- OpenAI, 2024
OpenAI (2024). ChatGPT. Accessed: 2024-07-23.
- Paperno et al., 2016
Paperno, D., Kruszewski, G., Lazaridou, A., Pham, Q. N., Bernardi, R., Pezzelle, S., … Fernández, R. (2016). The LAMBADA dataset: Word prediction requiring a broad discourse context.
- Papernot et al., 2017
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017). Practical black-box attacks against machine learning. ACM on Asia Conference on Computer and Communications Security (pp. 506–519).
- Papernot et al., 2016
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B., & Swami, A. (2016). The limitations of deep learning in adversarial settings. IEEE European Symposium on Security and Privacy (pp. 372–387).
- Papineni et al., 2002
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. Annual Meeting of the Association for Computational Linguistics.
- Peters et al., 2018
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- Pinto et al., 2024
Pinto, F., Rauschmayr, N., Tramèr, F., Torr, P., & Tombari, F. (2024). Extracting training data from document-based vqa models. arXiv preprint arXiv:2407.08707.
- Prakash et al., 2018
Prakash, A., Moran, N., Garber, S., DiLillo, A., & Storer, J. (2018). Deflecting adversarial attacks with pixel deflection. IEEE Conference on Computer Vision and Pattern Recognition (pp. 8571–8580).
- Pruthi et al., 2019
Pruthi, D., Dhingra, B., & Lipton, Z. C. (2019). Combating adversarial misspellings with robust word recognition. Annual Meeting of the Association for Computational Linguistics (pp. 5582–5591).
- Qi et al., 2023
Qi, X., Huang, K., Panda, A., Wang, M., & Mittal, P. (2023). Visual adversarial examples jailbreak large language models. arXiv:2306.13213.
- Qian et al., 2020
Qian, Y., Yin, G., Sheng, L., Chen, Z., & Shao, J. (2020). Thinking in frequency: face forgery detection by mining frequency-aware clues. European Conference on Computer Vision.
- Qin et al., 2019
Qin, C., Martens, J., Gowal, S., Krishnan, D., Dvijotham, K., Fawzi, A., … Kohli, P. (2019). Adversarial robustness through local linearization. Advances in Neural Information Processing Systems.
- Radford et al., 2021
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … others. (2021). Learning transferable visual models from natural language supervision. International Conference on Machine Learning.
- Radford et al., 2018
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., & others. (2018). Improving language understanding by generative pre-training.
- Rafailov et al., 2024
Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2024). Direct preference optimization: your language model is secretly a reward model. Advances in Neural Information Processing Systems.
- Ramachandran et al., 2017
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions.
- Ramesh et al., 2022
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
- Rebuffi et al., 2021a
Rebuffi, S.-A., Gowal, S., Calian, D. A., Stimberg, F., Wiles, O., & Mann, T. (2021). Fixing data augmentation to improve adversarial robustness.
- Rebuffi et al., 2021b
Rebuffi, S.-A., Gowal, S., Calian, D. A., Stimberg, F., Wiles, O., & Mann, T. A. (2021). Data augmentation can improve robustness. Advances in Neural Information Processing Systems.
- Rice et al., 2020
Rice, L., Wong, E., & Kolter, Z. (2020). Overfitting in adversarially robust deep learning. International Conference on Machine Learning.
- Robey et al., 2023
Robey, A., Wong, E., Hassani, H., & Pappas, G. J. (2023). Smoothllm: defending large language models against jailbreaking attacks. arXiv preprint arXiv:2310.03684.
- Rombach et al., 2022
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Ronneberger et al., 2015
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer Assisted Intervention (pp. 234–241).
- Roth et al., 2019
Roth, K., Kilcher, Y., & Hofmann, T. (2019). The odds are odd: a statistical test for detecting adversarial examples. International Conference on Machine Learning (pp. 5498–5507).
- Saha et al., 2020
Saha, A., Subramanya, A., & Pirsiavash, H. (2020). Hidden trigger backdoor attacks. AAAI Conference on Artificial Intelligence.
- Sakaguchi et al., 2017
Sakaguchi, K., Duh, K., Post, M., & Van Durme, B. (2017). Robsut wrod reocginiton via semi-character recurrent neural network. AAAI Conference on Artificial Intelligence.
- Samangouei et al., 2018
Samangouei, P., Kabkab, M., & Chellappa, R. (2018). Defense-gan: protecting classifiers against adversarial attacks using generative models. International Conference on Learning Representations.
- Schlarmann et al., 2024
Schlarmann, C., Singh, N. D., Croce, F., & Hein, M. (2024). Robust clip: unsupervised adversarial fine-tuning of vision embeddings for robust large vision-language models. International Conference on Machine Learning.
- Schubert et al., 2014
Schubert, E., Zimek, A., & Kriegel, H.-P. (2014). Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data mining and knowledge discovery, 28, 190–237.
- Schuhmann et al., 2022
Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C. W., Wightman, R., Cherti, M., … Jitsev, J. (2022). LAION-5b: an open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems.
- Schulman et al., 2017
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Selvaraju et al., 2017
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: visual explanations from deep networks via gradient-based localization. IEEE International Conference on Computer Vision.
- Sennrich et al., 2016
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. Annual Meeting of the Association for Computational Linguistics.
- Sha et al., 2023
Sha, Z., He, X., Yu, N., Backes, M., & Zhang, Y. (2023). Can't steal? cont-steal! contrastive stealing attacks against image encoders. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Shafahi et al., 2018
Shafahi, A., Huang, W. R., Najibi, M., Suciu, O., Studer, C., Dumitras, T., & Goldstein, T. (2018). Poison frogs! targeted clean-label poisoning attacks on neural networks. Advances in Neural Information Processing Systems.
- Shafahi et al., 2019
Shafahi, A., Najibi, M., Ghiasi, M. A., Xu, Z., Dickerson, J., Studer, C., … Goldstein, T. (2019). Adversarial training for free! Advances in Neural Information Processing Systems.
- Shao et al., 2022
Shao, R., Shi, Z., Yi, J., Chen, P.-Y., & Hsieh, C.-J. (2022). On the adversarial robustness of vision transformers. Transactions on Machine Learning Research.
- Sharif et al., 2016
Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M. K. (2016). Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. ACM SIGSAC Conference on Computer and Communications Security.
- Sharma et al., 2018
Sharma, P., Ding, N., Goodman, S., & Soricut, R. (2018). Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning. Annual Meeting of the Association for Computational Linguistics.
- Shayegani et al., 2023
Shayegani, E., Dong, Y., & Abu-Ghazaleh, N. (2023). Plug and pray: exploiting off-the-shelf components of multi-modal models. arXiv:2307.14539.
- Shen et al., 2016
Shen, S., Tople, S., & Saxena, P. (2016). Auror: defending against poisoning attacks in collaborative deep learning systems. Conference on Computer Security Applications.
- Shen & Sanghavi, 2019
Shen, Y., & Sanghavi, S. (2019). Learning with bad training data via iterative trimmed loss minimization. International Conference on Machine Learning (pp. 5739–5748).
- Shi et al., 2022
Shi, Y., Han, Y., Tan, Y.-a., & Kuang, X. (2022). Decision-based black-box attack against vision transformers via patch-wise adversarial removal. Advances in Neural Information Processing Systems.
- Shin et al., 2020
Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). Autoprompt: eliciting knowledge from language models with automatically generated prompts. Conference on Empirical Methods in Natural Language Processing.
- Shokri et al., 2017
Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. IEEE Symposium on Security and Privacy.
- Smith & Topin, 2019
Smith, L. N., & Topin, N. (2019). Super-convergence: very fast training of residual networks using large learning rates. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications (pp. 369–386).
- Smith, 2007
Smith, R. (2007). An overview of the tesseract ocr engine. International Conference on Document Analysis and Recognition.
- Somepalli et al., 2022
Somepalli, G., Singla, V., Goldblum, M., Geiping, J., & Goldstein, T. (2022). Diffusion art or digital forgery? Investigating data replication in diffusion models. arXiv preprint arXiv:2212.03860.
- Somepalli et al., 2023
Somepalli, G., Singla, V., Goldblum, M., Geiping, J., & Goldstein, T. (2023). Understanding data replication in diffusion models. International Conference on Machine Learning WorkShop.
- Song et al., 2020
Song, J., Meng, C., & Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
- Song et al., 2013
Song, S., Chaudhuri, K., & Sarwate, A. D. (2013). Stochastic gradient descent with differentially private updates. IEEE Global Conference on Signal and Information Processing.
- Sorokin & Forsyth, 2008
Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Srivastava et al., 2014
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
- Subramanya et al., 2024
Subramanya, A., Koohpayegani, S. A., Saha, A., Tejankar, A., & Pirsiavash, H. (2024). A closer look at robustness of vision transformers to backdoor attacks. IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 3874–3883).
- Subramanya et al., 2022
Subramanya, A., Saha, A., Koohpayegani, S. A., Tejankar, A., & Pirsiavash, H. (2022). Backdoor attacks on vision transformers. arXiv preprint arXiv:2206.08477.
- Sun et al., 2023
Sun, X., Li, X., Meng, Y., Ao, X., Lyu, L., Li, J., & Zhang, T. (2023). Defending against backdoor attacks in natural language generation. AAAI Conference on Artificial Intelligence.
- Sun et al., 2019
Sun, Z., Kairouz, P., Suresh, A. T., & McMahan, H. B. (2019). Can you really backdoor federated learning?
- Sur et al., 2023
Sur, I., Sikka, K., Walmer, M., Koneripalli, K., Roy, A., Lin, X., … Jha, S. (2023). Tijo: trigger inversion with joint optimization for defending multimodal backdoored models. IEEE/CVF International Conference on Computer Vision.
- Szegedy et al., 2014
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. International Conference on Learning Representations.
- Szyller et al., 2021
Szyller, S., Atli, B. G., Marchal, S., & Asokan, N. (2021). Dawn: dynamic adversarial watermarking of neural networks. ACM International Conference on Multimedia.
- Tan & Le, 2019
Tan, M., & Le, Q. (2019). Efficientnet: rethinking model scaling for convolutional neural networks. International Conference on Machine Learning (pp. 6105–6114).
- Tang et al., 2020
Tang, R., Du, M., Liu, N., Yang, F., & Hu, X. (2020). An embarrassingly simple approach for trojan attack in deep neural networks. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 218–228).
- Taori et al., 2023
Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., … Hashimoto, T. B. (2023). Alpaca: a strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm.stanford.edu/2023/03/13/alpaca.html, 3(6), 7.
- Tejankar et al., 2023
Tejankar, A., Sanjabi, M., Wang, Q., Wang, S., Firooz, H., Pirsiavash, H., & Tan, L. (2023). Defending against patch-based backdoor attacks on self-supervised learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Thies et al., 2016
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: real-time face capture and reenactment of rgb videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Tian et al., 2018
Tian, S., Yang, G., & Cai, Y. (2018). Detecting adversarial examples through image transformation. AAAI Conference on Artificial Intelligence.
- Touvron et al., 2023
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., … others. (2023). Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Tramer et al., 2020
Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. Advances in Neural Information Processing Systems (pp. 1633–1645).
- Tramer et al., 2018
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., & McDaniel, P. (2018). Ensemble adversarial training: attacks and defenses. International Conference on Learning Representations.
- Tramer et al., 2016
Tramèr, F., Zhang, F., Juels, A., Reiter, M. K., & Ristenpart, T. (2016). Stealing machine learning models via prediction $\$APIs$\$. USENIX Security Symposium (pp. 601–618).
- Tran et al., 2018
Tran, B., Li, J., & Madry, A. (2018). Spectral signatures in backdoor attacks. Advances in Neural Information Processing Systems.
- Tu et al., 2019
Tu, C.-C., Ting, P., Chen, P.-Y., Liu, S., Zhang, H., Yi, J., … Cheng, S.-M. (2019). Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks. AAAI Conference on Artificial Intelligence (pp. 742–749).
- Turner et al., 2018
Turner, A., Tsipras, D., & Madry, A. (2018). Clean-label backdoor attacks.
- Uchida et al., 2017
Uchida, Y., Nagai, Y., Sakazawa, S., & Satoh, Shin'ichi. (2017). Embedding watermarks into deep neural networks. ACM on International Conference on Multimedia Retrieval.
- Vaswani et al., 2017
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
- Wang & Gong, 2018
Wang, B., & Gong, N. Z. (2018). Stealing hyperparameters in machine learning. IEEE Symposium on Security and Privacy (pp. 36–52).
- Wang et al., 2019a
Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., & Zhao, B. Y. (2019). Neural cleanse: identifying and mitigating backdoor attacks in neural networks. IEEE Symposium on Security and Privacy (pp. 707–723).
- Wang et al., 2017
Wang, D., Ye, M., & Xu, J. (2017). Differentially private empirical risk minimization revisited: faster and more general. Advances in Neural Information Processing Systems.
- Wang et al., 2020a
Wang, H., Sreenivasan, K., Rajput, S., Vishwakarma, H., Agarwal, S., Sohn, J.-y., … Papailiopoulos, D. (2020). Attack of the tails: yes, you really can backdoor federated learning. Advances in Neural Information Processing Systems (pp. 16070–16084).
- Wang et al., 2024a
Wang, R., Ma, X., Zhou, H., Ji, C., Ye, G., & Jiang, Y.-G. (2024). White-box multimodal jailbreaks against large vision-language models. arXiv:2405.17894.
- Wang et al., 2022
Wang, S., Nepal, S., Abuadbba, A., Rudolph, C., & Grobler, M. (2022). Adversarial detection by latent style transformations. IEEE Transactions on Information Forensics and Security, 17, 1099–1114.
- Wang et al., 2020b
Wang, S., Nepal, S., Rudolph, C., Grobler, M., Chen, S., & Chen, T. (2020). Backdoor attacks against transfer learning with pre-trained deep learning models. IEEE Transactions on Services Computing.
- Wang et al., 2023a
Wang, X., Ji, Z., Ma, P., Li, Z., & Wang, S. (2023). Instructta: instruction-tuned targeted attack for large vision-language models. arXiv:2312.01886.
- Wang et al., 2019b
Wang, Y., Ma, X., Bailey, J., Yi, J., Zhou, B., & Gu, Q. (2019). On the convergence and robustness of adversarial training. International Conference on Machine Learning (pp. 6586–6595).
- Wang et al., 2019c
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., & Gu, Q. (2019). Improving adversarial robustness requires revisiting misclassified examples. International Conference on Learning Representations.
- Wang et al., 2023b
Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., & Yan, S. (2023). Better diffusion models further improve adversarial training. International Conference on Machine Learning.
- Wang et al., 2024b
Wang, Z., Li, X., Zhu, H., & Xie, C. (2024). Revisiting adversarial training at scale. arXiv:2401.04727.
- Wang et al., 2004
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
- Webster, 2023
Webster, R. (2023). A reproducible extraction of training images from diffusion models. arXiv preprint arXiv:2305.08694.
- Wei et al., 2021
Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., … Le, Q. V. (2021). Finetuned language models are zero-shot learners. International Conference on Machine Learning.
- Wei et al., 2022a
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., … others. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems.
- Wei & Zou, 2019
Wei, J., & Zou, K. (2019). Eda: easy data augmentation techniques for boosting performance on text classification tasks. Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing.
- Wei et al., 2022b
Wei, Z., Chen, J., Goldblum, M., Wu, Z., Goldstein, T., & Jiang, Y.-G. (2022). Towards transferable adversarial attacks on vision transformers. AAAI Conference on Artificial Intelligence (pp. 2668–2676).
- Wen et al., 2024
Wen, Y., Liu, Y., Chen, C., & Lyu, L. (2024). Detecting, explaining, and mitigating memorization in diffusion models. International Conference on Learning Representations.
- Williams & Peng, 1990
Williams, R. J., & Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 2, 490–501.
- Williams & Zipser, 2013
Williams, R. J., & Zipser, D. (2013). Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropagation (pp. 433–486). Psychology Press.
- Wong et al., 2020
Wong, E., Rice, L., & Kolter, J. Z. (2020). Fast is better than free: revisiting adversarial training. International Conference on Learning Representations.
- Wu et al., 2023a
Wu, C., Yin, S., Qi, W., Wang, X., Tang, Z., & Duan, N. (2023). Visual chatgpt: talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671.
- Wu & Wang, 2021
Wu, D., & Wang, Y. (2021). Adversarial neuron pruning purifies backdoored deep models. Advances in Neural Information Processing Systems (pp. 16913–16925).
- Wu et al., 2020a
Wu, D., Wang, Y., Xia, S.-T., Bailey, J., & Ma, X. (2020). Skip connections matter: on the transferability of adversarial examples generated with resnets. International Conference on Learning Representations.
- Wu et al., 2020b
Wu, D., Xia, S.-T., & Wang, Y. (2020). Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems (pp. 2958–2969).
- Wu et al., 2023b
Wu, S., Ma, C., Wei, K., Xu, X., Ding, M., Qian, Y., & Xiang, T. (2023). Refine, discriminate and align: stealing encoders via sample-wise prototypes and multi-relational extraction. arXiv preprint arXiv:2312.00855.
- Xi et al., 2024
Xi, Z., Du, T., Li, C., Pang, R., Ji, S., Chen, J., … Wang, T. (2024). Defending pre-trained language models as few-shot learners against backdoor attacks. Advances in Neural Information Processing Systems.
- Xiang et al., 2024
Xiang, Z., Jiang, F., Xiong, Z., Ramasubramanian, B., Poovendran, R., & Li, B. (2024). Badchain: backdoor chain-of-thought prompting for large language models. arXiv preprint arXiv:2401.12242.
- Xiao et al., 2018
Xiao, C., Li, B., Zhu, J. Y., He, W., Liu, M., & Song, D. (2018). Generating adversarial examples with adversarial networks. International Joint Conference on Artificial Intelligence (pp. 3905–3911).
- Xie et al., 2019a
Xie, C., Huang, K., Chen, P.-Y., & Li, B. (2019). Dba: distributed backdoor attacks against federated learning. International Conference on Learning Representations.
- Xie et al., 2020
Xie, C., Tan, M., Gong, B., Yuille, A., & Le, Q. V. (2020). Smooth adversarial training.
- Xie et al., 2018
Xie, C., Wang, J., Zhang, Z., Ren, Z., & Yuille, A. (2018). Mitigating adversarial effects through randomization. International Conference on Learning Representations.
- Xie et al., 2019b
Xie, C., Wu, Y., Maaten, L. v. d., Yuille, A. L., & He, K. (2019). Feature denoising for improving adversarial robustness. IEEE Conference on Computer Vision and Pattern Recognition (pp. 501–509).
- Xie et al., 2019c
Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., & Yuille, A. L. (2019). Improving transferability of adversarial examples with input diversity. IEEE Conference on Computer Vision and Pattern Recognition (pp. 2730–2739).
- Xu et al., 2020
Xu, K., Zhang, G., Liu, S., Fan, Q., Sun, M., Chen, H., … Lin, X. (2020). Adversarial t-shirt! evading person detectors in a physical world. European Conference on Computer Vision (pp. 665–681).
- Xu et al., 2018
Xu, W., Evans, D., & Qi, Y. (2018). Feature squeezing: detecting adversarial examples in deep neural networks. Network and Distributed Systems Security Symposium.
- Xu et al., 2023
Xu, X., Zhang, J., & Kankanhalli, M. (2023). Autolora: a parameter-free automated robust fine-tuning framework. arXiv preprint arXiv:2310.01818.
- Yan et al., 2024
Yan, J., Yadav, V., Li, S., Chen, L., Tang, Z., Wang, H., … Jin, H. (2024). Backdooring instruction-tuned large language models with virtual prompt injection. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- Yang et al., 2017
Yang, C., Wu, Q., Li, H., & Chen, Y. (2017). Generative poisoning attack method against neural networks.
- Yang et al., 2020
Yang, H., Zhang, J., Dong, H., Inkawhich, N., Gardner, A., Touchet, A., … Li, H. (2020). Dverge: diversifying vulnerabilities for enhanced robust generation of ensembles. Advances in Neural Information Processing Systems (pp. 5505–5515).
- Yang et al., 2019a
Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology, 10, 1–19.
- Yang et al., 2023a
Yang, W., Gao, J., & Mirzasoleiman, B. (2023). Better safe than sorry: pre-training clip against targeted data poisoning and backdoor attacks. arXiv preprint arXiv:2310.05862.
- Yang et al., 2023b
Yang, W., Gao, J., & Mirzasoleiman, B. (2023). Robust contrastive language-image pretraining against data poisoning and backdoor attacks. Advances in Neural Information Processing Systems.
- Yang et al., 2023c
Yang, Y., Gao, R., Wang, X., Xu, N., & Xu, Q. (2023). Mma-diffusion: multimodal attack on diffusion models. arXiv:2311.17516.
- Yang et al., 2022
Yang, Y., Liu, T. Y., & Mirzasoleiman, B. (2022). Not all poisons are created equal: robust training against data poisoning. International Conference on Machine Learning (pp. 25154–25165).
- Yang et al., 2019b
Yang, Z., Chang, E.-C., & Liang, Z. (2019). Adversarial neural network inversion via auxiliary knowledge alignment. arXiv preprint arXiv:1902.08552.
- Yang et al., 2023d
Yang, Z., He, X., Li, Z., Backes, M., Humbert, M., Berrang, P., & Zhang, Y. (2023). Data poisoning attacks against multimodal encoders. International Conference on Machine Learning.
- Yao et al., 2019
Yao, Y., Li, H., Zheng, H., & Zhao, B. Y. (2019). Latent backdoor attacks on deep neural networks. ACM SIGSAC Conference on Computer and Communications Security (pp. 2041–2055).
- Ye et al., 2021
Ye, D., Lin, Y., Huang, Y., & Sun, M. (2021). Tr-bert: dynamic token reduction for accelerating bert inference. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- Yeom et al., 2018
Yeom, S., Giacomelli, I., Fredrikson, M., & Jha, S. (2018). Privacy risk in machine learning: analyzing the connection to overfitting. IEEE Computer Security Foundations Workshop.
- Yin et al., 2018
Yin, D., Chen, Y., Kannan, R., & Bartlett, P. (2018). Byzantine-robust distributed learning: towards optimal statistical rates. International Conference on Machine Learning.
- Yin et al., 2020
Yin, H., Molchanov, P., Alvarez, J. M., Li, Z., Mallya, A., Hoiem, D., … Kautz, J. (2020). Dreaming to distill: data-free knowledge transfer via deepinversion. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Yu et al., 2018
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., & Sang, N. (2018). Bisenet: bilateral segmentation network for real-time semantic segmentation. European Conference on Computer Vision.
- Yu et al., 2020
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., & Finn, C. (2020). Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems.
- Yuan et al., 2023
Yuan, Z., Zhou, P., Zou, K., & Cheng, Y. (2023). You are catching my attention: are vision transformers bad learners under backdoor attacks? IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 24605–24615).
- Zhai et al., 2023
Zhai, S., Dong, Y., Shen, Q., Pu, S., Fang, Y., & Su, H. (2023). Text-to-image diffusion models can be easily backdoored through multimodal data poisoning. ACM International Conference on Multimedia.
- Zhang et al., 2019a
Zhang, D., Zhang, T., Lu, Y., Zhu, Z., & Dong, B. (2019). You only propagate once: accelerating adversarial training via maximal principle. Advances in Neural Information Processing Systems.
- Zhang et al., 2019b
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019). Theoretically principled trade-off between robustness and accuracy. International Conference on Machine Learning (pp. 7472–7482).
- Zhang et al., 2024a
Zhang, J., Wang, Z., Wang, R., Ma, X., & Jiang, Y.-G. (2024). Enja: ensemble jailbreak on large language models. arXiv preprint arXiv:2408.03603.
- Zhang et al., 2018
Zhang, J., Gu, Z., Jang, J., Wu, H., Stoecklin, M. P., Huang, H., & Molloy, I. (2018). Protecting intellectual property of deep neural networks with watermarking. ACM Asia Conference on Computer and Communications Security.
- Zhang et al., 2024b
Zhang, J., Ma, X., Wang, X., Qiu, L., Wang, J., Jiang, Y.-G., & Sang, J. (2024). Adversarial prompt tuning for vision-language models. European Conference on Computer Vision.
- Zhang et al., 2022a
Zhang, J., Yi, Q., & Sang, J. (2022). Towards adversarial attack on vision-language pre-training models. ACM International Conference on Multimedia (pp. 5005–5013).
- Zhang et al., 2017
Zhang, J., Zheng, K., Mou, W., & Wang, L. (2017). Efficient private ERM for smooth objectives.
- Zhang et al., 2020a
Zhang, J., Chen, D., Liao, J., Fang, H., Zhang, W., Zhou, W., … Yu, N. (2020). Model watermarking for image processing networks. AAAI Conference on Artificial Intelligence.
- Zhang et al., 2021
Zhang, J., Chen, D., Liao, J., Zhang, W., Feng, H., Hua, G., & Yu, N. (2021). Deep model intellectual property protection via deep watermarking. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Zhang et al., 2020b
Zhang, J., Xu, X., Han, B., Niu, G., Cui, L., Sugiyama, M., & Kankanhalli, M. (2020). Attacks which do not kill training make adversarial learning stronger. International Conference on Machine Learning (pp. 11278–11287).
- Zhang et al., 2020c
Zhang, J., Zhu, J., Niu, G., Han, B., Sugiyama, M., & Kankanhalli, M. (2020). Geometry-aware instance-reweighted adversarial training. International Conference on Learning Representations.
- Zhang et al., 2024c
Zhang, J., Liu, H., Jia, J., & Gong, N. Z. (2024). Data poisoning based backdoor attacks to contrastive learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Zhang et al., 2024d
Zhang, M., Yu, N., Wen, R., Backes, M., & Zhang, Y. (2024). Generated distributions are all you need for membership inference attacks against generative models. IEEE/CVF Winter Conference on Applications of Computer Vision.
- Zhang et al., 2022b
Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., … Li, H. (2022). Tip-adapter: training-free adaption of clip for few-shot classification. European Conference on Computer Vision.
- Zhang et al., 2023
Zhang, S., Zhang, M., Pan, X., & Yang, M. (2023). No-skim: towards efficiency robustness evaluation on skimming-based language models. arXiv preprint arXiv:2312.09494.
- Zhang et al., 2020d
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). Bertscore: evaluating text generation with bert. International Conference on Learning Representations.
- Zhao et al., 2021
Zhao, H., Wei, T., Zhou, W., Zhang, W., Chen, D., & Yu, N. (2021). Multi-attentional deepfake detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Zhao et al., 2024
Zhao, Y., Pang, T., Du, C., Yang, X., Li, C., Cheung, N.-M. M., & Lin, M. (2024). On evaluating adversarial robustness of large vision-language models. Advances in Neural Information Processing Systems.
- Zheng et al., 2023
Zheng, M., Lou, Q., & Jiang, L. (2023). Trojvit: trojan insertion in vision transformers. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4025–4034).
- Zhou et al., 2024a
Zhou, A., Li, B., & Wang, H. (2024). Robust prompt optimization for defending language models against jailbreaking attacks. arXiv preprint arXiv:2401.17263.
- Zhou et al., 2024b
Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., … others. (2024). Lima: less is more for alignment. Advances in Neural Information Processing Systems.
- Zhou et al., 2023a
Zhou, Z., Hu, S., Li, M., Zhang, H., Zhang, Y., & Jin, H. (2023). Advclip: downstream-agnostic adversarial examples in multimodal contrastive learning. ACM International Conference on Multimedia.
- Zhou et al., 2023b
Zhou, Z., Hu, S., Zhao, R., Wang, Q., Zhang, L. Y., Hou, J., & Jin, H. (2023). Downstream-agnostic adversarial examples. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4345–4355).
- Zhu et al., 2020
Zhu, C., Cheng, Y., Gan, Z., Sun, S., Goldstein, T., & Liu, J. (2020). Freelb: enhanced adversarial training for natural language understanding. International Conference on Learning Representations.
- Zhu et al., 2019
Zhu, C., Huang, W. R., Li, H., Taylor, G., Studer, C., & Goldstein, T. (2019). Transferable clean-label poisoning attacks on deep neural nets. International Conference on Machine Learning (pp. 7614–7623).
- Zhu et al., 2023
Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). Minigpt-4: enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592.
- Zhu et al., 2021
Zhu, J., Yao, J., Han, B., Zhang, J., Liu, T., Niu, G., … Yang, H. (2021). Reliable adversarial distillation with unreliable teachers. International Conference on Learning Representations.
- Zhu et al., 2024
Zhu, L., Ning, R., Li, J., Xin, C., & Wu, H. (2024). Seer: backdoor detection for vision-language models through searching target text and image trigger jointly. AAAI Conference on Artificial Intelligence.
- Zhuang et al., 2023
Zhuang, H., Zhang, Y., & Liu, S. (2023). A pilot study of query-free adversarial attack against stable diffusion. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2384–2391).
- Zi et al., 2021
Zi, B., Zhao, S., Ma, X., & Jiang, Y.-G. (2021). Revisiting adversarial robustness distillation: robust soft labels make student better. International Conference on Computer Vision.
- Zou et al., 2023
Zou, A., Wang, Z., Kolter, J. Z., & Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043.
- , 2023
张奇、桂韬、黄萱菁. (2023). 自然语言处理导论. 上海: 电子工业出版社.