An air of celebration mixed with apprehensive bracing seems to hang around the development of AI systems. People want AI to solve all of humanity’s problems while nervously eyeing the risks that seem to keep piling up in the nascent evolution of this rendition of new tech. We are teetering on the brink of something - chaos or liberation is yet to be revealed. Wherever you might lie on this spectrum of celebration to deep cynicism regarding AI, I can guarantee that you have interacted with it and/or thought about its impacts on your life. As an organization that concerns itself with digital safety, Tattle has inevitably found itself working to develop safety guardrails for AI. In our research into this emerging field, we are coming to recognize that there continues to be critical limitations to existing AI safety discourse.
With understanding any new concept it is important to first define the terms and know the parameters of what is being discussed. Firstly, what is AI? A simple definition provided by IBM says:
Artificial intelligence (AI) is technology that enables computers and machines to simulate human learning, comprehension, problem solving, decision making, creativity and autonomy.1
In other words AI refers to computer systems that are learning to think like humans or developing machine learning capabilities.
Next, what is safety? Simply put safety is a condition of being protected from harm. With reference to AI, safety is about preventing or reducing harms posed by AI to humans and society at large. Given this definition, it is important to understand the parameters within which current research on and practice of AI safety is being conducted. To begin with, what are the primary risks and/or harms that AI safety seeks to mitigate or prevent.
As we understand them, AI systems are known to have produced several risks, some of which are listed below: :
How are public and private actors within the field of AI addressing these risks? Currently, there are three main trends in AI safety.2
Cumulatively, these three approaches to AI safety aim to improve trustworthiness in AI to make it more safe in its uses by focusing on specific aspects including security, accountability, monitoring, effectiveness, reliability, robustness, transparency and explainability, and fairness. What differentiates these approaches is their objectives, methodology and audience.
Safety engineering is the field of engineering that developed to ensure safety from harms within all the diverse fields of engineering. Some examples include the safety policies deployed in a chemical producing factory, research into the safety mechanics of automobiles, the research of safety practices appropriate for keeping nuclear plants from leaking harmful radiations. In a similar fashion, safety engineering for AI has focused its efforts in engineering AI to be more aligned with human values to ensure that it does not become autonomous or take actions that go against human interest. This premise assumes that the audience for AI is homogenous to an extent and resembles the creators of AI systems in that it assumes shared values.
Arriving at consensus on what human values are shared by all humankind for AI alignment is a monumental task rife with contradictions and power inequalities. It involves deliberating over which humans get to control AI. Is it merely those who pay for the development and deployment of machine learning systems? Should it be governments within whose jurisdiction AI systems are deployed? Or should it be the users who ultimately experience the effects of the use of an AI system? Is there an argument for considering the natural flora and fauna around the server farms for AI systems to be stakeholders as we redirect potable water for cooling AI data centres? The field of safety engineering has yet to propose solutions for building regulatory and monitoring institutions that are accountable to a broad array of stakeholders for overseeing AI safety in such a way that it ensures the greatest reduction of harm and highest level of protection of all these stakeholders.
This brings us to the second set of AI safety practices that has evolved as a response to the narrow safety engineering focus. There is a growing number of civil society, research and non-profit organizations around the world that are developing toolkits, processes, and policy recommendations for embedding principles of social justice and harm repair within AI systems. This avenue of AI safety concerns itself with identifying the existing biases, 3 prejudices, inequalities, and misrepresentation in the data 4 used to train AI systems and the institutional structures in which AI is designed and deployed.
This set of approaches tends to operate by conducting safety evaluations of AI systems before or during their pilot tests to identify issues arising at the point of consumer contact. Safety evaluations focus on accuracy of responses, appropriateness of tone, and the degree of deviation from the correct response (also known as hallucinations) that an AI system shows. Teams that run this specific kind of safety evaluation aim to produce context-specific suggestions for safety, recognizing that there is a diversity of interests, values, and risks experienced by the users of any AI system. Therefore, a one-size-fits-all safety solution is likely to be inadequate in building more reliable and trustworthy systems.
The third approach to safety is planning for the possibility of catastrophic risks from AI in the future. A large part of this research focuses on how to mitigate risks arising from AI operating autonomously and resisting human control5. In tandem, an environmental movement against the use of AI due to its devastating impacts on water scarcity 6 and destruction of local ecosystems also operates within this framework and argues against the deployment of AI altogether as the safest approach forward. They suggest that AI should be reserved for only a few use cases that are critical for human survival and not be used ubiquitously to accomplish daily tasks7.
AI is not separate from humans. We designed it to replicate human thinking and eventually think by itself. But it's designed on human data and biases and therefore it is reflecting human biases and violences. So to address AI safety we need to address the very human institutions that create it. We have to go several steps back before a model even exists to think of what are the intentions and motives with which it was created. To make AI safe it is necessary to monitor and regulate the institutions8 within which it is developed and where decision-making about its design is conducted.
The current way to address this is to analyze training data used to train a model and identify the biases, misrepresentations, and lacunae present in it. Most commonly, researchers are trying to feed more diverse training data in order to improve model performance. However, adding data does not address the initial biases that have become part of the model’s knowledge system. To address that requires starting from scratch and redeveloping models with multi-stakeholder inputs, an approach that most corporates will not pursue due to costs and loss of profits. As public pressure against the large-scale adoption of AI grows, researchers, activists, and policy makers are grappling with putting safety guardrails into place to mitigate current harmful effects of AI systems.
All of this brings us back to the beginning of this piece where I set out with some definitions of AI, safety, and risk. Surveying the field reveals that one of the underlying assumptions across much existing work is that there is a shared understanding and experience of risk and safety across the globe. Therefore, delving into AI safety in a meaningful way in the Global South requires developing taxonomies of risks and safety measures that are rooted in the specific cultural, historical, linguistic, and geographical contexts where AI will be deployed. In order to address this big omission it is necessary for AI safety practitioners to seriously consider:
Cole Strykery and Eda Kavlakoglu, What Is Artificial Intelligence (AI)?, IBM, 9 August 2024,
https://www.ibm.com/think/topics/artificial-intelligence ↩
Jacqueline Harding and Cameron Domenico Kirk-Giannini, What Is AI Safety? What Do We Want It to Be?, arXiv:2505.02313, version 1, preprint, arXiv, 6 May 2025,
https://doi.org/10.48550/arXiv.2505.02313 ↩
Shelton Fitch, Revisiting AI Red-Teaming, Center for Security and Emerging Technology, 26 September 2024,
https://cset.georgetown.edu/article/revisiting-ai-red-teaming/;
Markov Grey and Charbel-Raphaël Segerie, Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods, arXiv:2505.05541, preprint, arXiv, 8 May 2025,
https://doi.org/10.48550/arXiv.2505.05541;
Laura Weidinger et al., Sociotechnical Safety Evaluation of Generative AI Systems, arXiv:2310.11986, preprint, arXiv, 31 October 2023,
https://doi.org/10.48550/arXiv.2310.11986 ↩
Shaina Raza et al., MBIAS: Mitigating Bias in Large Language Models While Retaining Context, Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Association for Computational Linguistics, 2024, 97–111,
https://doi.org/10.18653/v1/2024.wassa-1.9 ↩
Miles Brundage et al., The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation, arXiv:1802.07228, preprint, arXiv, 1 December 2024,
https://doi.org/10.48550/arXiv.1802.07228 ↩
UNEA, AI Has an Environmental Problem. Here’s What the World Can Do about That, UN Environment Programme, 13 November 2025,
https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about ↩
Beyond Fossil Fuels et al., WITHIN BOUNDS: Limiting AI’s Environmental Impact — Joint Statement from Civil Society for the AI Action Summit, 7 February 2025,
https://beyondfossilfuels.org/2025/02/07/within-bounds-limiting-ais-environmental-impact/ ↩
Tim O’Reilly, You Can’t Regulate What You Don’t Understand, 14 April 2023,
https://web.archive.org/web/20230414162057/https://www.oreilly.com/content/you-cant-regulate-what-you-dont-understand-2/ ↩