Managing Security Risks of AI Agents in Adversarial Contexts: A Conceptual Integration of the CIA Triad and Organisational Resilience for Digital Governance

Krystian Bień; Elwira Pyk; Mariusz Rafało

doi:10.18778/0208-6018.374.04

Authors

Krystian Bień Koźmiński University, Warsaw, Poland https://orcid.org/0009-0006-3234-8249
Elwira Pyk Koźmiński University, Warsaw, Poland https://orcid.org/0009-0000-9653-5517
Mariusz Rafało Warsaw School of Economics, Warsaw, Poland https://orcid.org/0000-0002-4868-3571

DOI:

https://doi.org/10.18778/0208-6018.374.04

Keywords:

AI governance, organisational resilience, risk management, digital transformation, cybersecurity

Abstract

Artificial intelligence (AI) agents are increasingly deployed across organisational environments, introducing not only efficiency gains but also complex security and governance challenges. This paper explores how the integration of technical and managerial frameworks can enhance the resilience of organisations operating under adversarial conditions. Building on a conceptual and narrative review, the study synthesises cybersecurity principles represented by the CIA triad (Confidentiality, Integrity, Availability) with the organisational resilience model outlined in ISO 22316:2017. The proposed conceptual integration demonstrates how resilience principles of anticipation, adaptation, and recovery complement traditional security controls, transforming AI agent protection into a strategic capability that supports digital transformation and continuity management. The research question guiding this study is: How can organisations enhance resilience and manage security risks arising from the deployment of autonomous AI agents in adversarial environments? This paper argues that integrating resilience principles into AI agent governance strengthens organisational security, operational continuity, and adaptive capacity under adversarial conditions. This interdisciplinary approach extends the discourse beyond technical cybersecurity, positioning AI agent safety within the broader domains of digital governance, management, and economics. The study contributes a novel conceptual framework and identifies strategic implications for policy making, innovation management, and sustainable digital ecosystems.

Downloads

Download data is not yet available.

References

Andress J. (2011), The Basics of Information Security: Understanding the Fundamentals of InfoSec in Theory and Practice, Syngress, Waltham, https://doi.org/10.1016/C2010-0-68336-2

Boin A., Eeten M.J.G. van (2013), The resilient organization, “Public Management Review”, vol. 15(3), pp. 429–445.

Borkar M., Shetty N., Hatte V., Omer H., Jadhav P. hu, Kawase N., Sharma D.K. (2023), Agent Tarini: A New Generation of AI Cyber Security Agents, “International Journal for Multidisciplinary Research”, vol. 5(6), pp. 1–8, https://www.ijfmr.com/papers/2023/6/8902.pdf [accessed: 1.02.2023]

Brundage M., Avin S., Wang J., Krueger G., Hadfield G., Khlaaf H., Yang J., Toner H., Fong R., Maharaj T., Koh P.W., Hooker S., Leung J., Trask A., Bluemke E., Lebensold J., O’Keefe C., Koren M., Ryffel T., Rubinovitz J.B., Besiroglu T., Carugati F., Clark J., Eckersley P., Haas S. de, Johnson M., Laurie B., Ingerman A., Krawczuk I., Askell A., Cammarota R., Lohn A., Krueger D., Stix C., Henderson P., Graham L., Prunkl C., Martin B., Seger E., Zilberman N., Ó hÉigeartaigh S., Kroeger F., Sastry G., Kagan R., Weller A., Tse B., Barnes E., Dafoe A., Scharre P., Herbert-Voss A., Rasser M., Sodhani S., Flynn C., Gilbert T.K., Dyer L., Khan S., Bengio Y., Anderljung M. (2020), Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims, https://arxiv.org/abs/2004.07213 [accessed: 20.04.2020].

Chan P.P.K., Luo F., Chen Z., Shu Y., Yeung D.S. (2021), Transfer learning based countermeasure against label flipping poisoning attack, “Information Sciences”, vol. 548, pp. 450–460, https://doi.org/10.1016/j.ins.2020.10.016

Costa D.G., Silva I., Medeiros M., Bittencourt J.C.N., Andrade M. (2024), A method to promote safe cycling powered by large language models and AI agents, “MethodsX”, vol. 13, 102880, https://doi.org/10.1016/j.mex.2024.102880

Debenedetti E., Zhang J., Balunović M., Beurer-Kellner L., Fischer M., Tramèr F. (2024), AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents, https://arxiv.org/abs/2406.13352 [accessed: 24.11.2024].

Durante Z., Huang Q., Wake N., Gong R., Park J.S., Sarkar B., Taori R., Noda Y., Terzopoulos D., Choi Y., Ikeuchi K., Vo H., Fei-Fei L., Gao J. (2024), Agent AI: Surveying the Horizons of Multimodal Interaction, https://arxiv.org/abs/2401.03568v2 [accessed: 7.01.2024].

Gallagher M., Pitropakis N., Chrysoulas C., Papadopoulos P., Mylonas A., Katsikas S. (2022), Investigating machine learning attacks on financial time series models, “Computers and Security”, vol. 123, 102933, https://doi.org/10.1016/j.cose.2022.102933

Gao S., Fang A., Huang Y., Giunchiglia V., Noori A., Schwarz J.R., Ektefaie Y., Kondic J., Zitnik M. (2024), Empowering biomedical discovery with AI agents, “Cell”, vol. 187(22), pp. 6125–6151, https://doi.org/10.1016/j.cell.2024.09.022

Goodfellow I., Shlens J., Szegedy C. (2015), Explaining and Harnesing Adversarial Examples, https://arxiv.org/abs/1412.6572 [accessed: 20.03.2025].

International Organization for Standardization (2017), ISO 22316:2017 – Security and resilience – Organizational resilience – Principles and attributes, Geneva, https://www.iso.org/standard/50053.html [accessed: 1.03.2017].

Kantchelian A., Tygar J.D., Joseph A.D. (2013), Evasion and Hardening of Tree Ensemble Classifiers, https://arxiv.org/abs/1509.07892 [accessed: 21.06.2013].

Karpathy A. (2024), Intro to Large Language Models, https://www.youtube.com/watch?v=zjkBMF hNj_g [accessed: 23.11. 2023].

Khaleel Y.L., Habeeb M.A., Alnabulsi H. (2024), Adversarial Attacks in Machine Learning: Key Insights and Defense Approaches, “Applied Data Science and Analysis”, vol. 2024, pp. 121–147, https://doi.org/10.58496/adsa/2024/011

Lee J.H., Kim Y.G., Ahn Y., Park S., Kong H.J., Choi J.Y., Kim K., Nam I.-C., Lee M.-C., Masuoka H., Miyauchi A., Kim S., Kim Y.A., Choe E.K., Chai Y.J. (2023), Investigation of optimal convolutional neural network conditions for thyroid ultrasound image analysis, “Scientific Reports”, vol. 13(1), pp. 1–9, https://doi.org/10.1038/s41598-023-28001-8

Linkov I., Eisenberg D.A., Plourde K., Seager T.P., Allen J., Kott A. (2013), Resilience metrics for cyber systems, “Environment Systems and Decisions”, vol. 33(4), pp. 471–476.

Lowd D., Meek C. (2005), Adversarial Learning, [in:] R.L. Grossman, R. Bayardo, K. Bennett, J. Vaidya (eds.), KDD ‘05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, Association for Computing Machinery, New York, pp. 641–647, https://dl.acm.org/doi/10.1145/1081870.1081950 [accessed: 21.08.2005].

Malatji M., Tolah A. (2024), Artificial intelligence (AI) cybersecurity dimensions: a comprehensive framework for understanding adversarial and offensive AI, “AI and Ethics”, vol. 5, pp. 883–910, https://doi.org/10.1007/s43681-024-00427-4

Motwani S., Baranchuk M., Strohmeier M., Bolina V., Torr P.H.S., Hammond L., Witt C.S.D. (2024), Secret Collusion among AI Agents: Multi-Agent Deception via Steganography, https://doi.org/10.48550/arxiv.2402.07510

Papagiannidis E., Enholm I.M., Dremel C., Mikalef P., Krogstie J. (2023), Toward AI Governance: Identifying Best Practices and Potential Barriers and Outcomes, “Information Systems Frontiers”, vol. 25(1), pp. 123–141, https://doi.org/10.1007/s10796-022-10251-y

Papernot N., Mcdaniel P., Goodfellow I. (2016), Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples, https://arxiv.org/abs/1605.07277 [accessed: 24.05.2016].

Park J.S., Brien J.C.O., Cai C.J., Morris M.R., Liang P., Bernstein M.S. (2023), Generative Agents: Interactive Simulacra of Human Behavior, https://arxiv.org/abs/2304.03442 [accessed: 7.04.2023].

Peng L., Li D., Zhang Z., Zhang T., Huang A., Yang S., Hu Y. (2024), Human-AI collaboration: Unraveling the effects of user proficiency and AI agent capability in intelligent decision support systems, “International Journal of Industrial Ergonomics”, vol. 103, 103629, https://doi.org/10.1016/j.er gon.2024.103629

Pitropakis N., Panaousis E., Giannetsos T., Anastasiadis E., Loukas G. (2019), A taxonomy and survey of attacks against machine learning, “Computer Science Review”, vol. 34, 100199, https://doi.org/10.1016/j.cosrev.2019.100199

Rafało M. (2020), Wymiar biznesowy ataków na systemy uczące się, [in:] J. Surma (ed.), Hakowanie sztucznej inteligencji, PWN, Warszawa, pp. 53–79.

Ramamoorthi V. (2024), A Review of AI and Multi-Agent Systems for Cloud Performance and Security, “International Journal of Scientific Research in Computer Science Engineering and Information Technology”, vol. 10(4), pp. 326–337, https://ijsrcseit.com/index.php/home/article/view/CS EIT24105112 [accessed: 19.02.2026].

Russell S., Norvig P. (2022), Artificial Intelligence: A Modern Approach, Pearson, Harlow.

Sarker I.H. (2023), Multi‐aspects AI‐based modeling and adversarial learning for cybersecurity intelligence and robustness: A comprehensive overview, “Security and Privacy”, vol. 6(5), e295, https://onlinelibrary.wiley.com/doi/10.1002/spy2.295 [accessed: 10.01.2023].

Sharif M., Bhagavatula S., Bauer L., Reiter M.K. (2016), Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition, [in:] Proceedings of the ACM Conference on Computer and Communications Security, Association for Computing Machinery, New York, pp. 1528–1540, https://doi.org/10.1145/2976749.2978392

Surma J. (2022), Wstęp do hakowania systemów uczących się, [in:] J. Surma (ed.), Hakowanie sztucznej inteligencji, PWN, Warszawa, pp. 13–34.

Valencia L.J. (2024), Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security, https://arxiv.org/abs/2406.07561 [accessed: 9.05.2024].

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I. (2017), Attention Is All You Need, “Advances in Neural Information Processing Systems”, vol. 30, pp. 5999–6009, https://arxiv.org/abs/1706.03762 [accessed: 2.08.2023].

Whitman M.E., Mattord H.J. (2012), Principles of Information Security, Cengage Learning, Boston.

Wooldridge M. (2009), An Introduction to MultiAgent Systems, John Wiley & Sons, Chichester.

Wu F., Wu S., Cao Y., Xiao C. (2024), WIPI: A New Web Threat for LLM-Driven Web Agents, http://arxiv.org/abs/2402.16965 [accessed: 26.02.2024].

Xiao H., Biggio B., Nelson B., Xiao H., Eckert C., Roli F. (2015), Support vector machines under adversarial label contamination, “Neurocomputing”, vol. 160, pp. 53–62, https://doi.org/10.1016/j.neucom.2014.08.081

Yampolskiy R.V., Spellchecker M.S. (2016), Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures, https://arxiv.org/abs/1610.07997 [accessed: 25.10.2016].