In December Agency Fund and C4GD convened an expert group to discuss evaluation of Generative AI in the social sector. These are reflections from the breakout discussion on safety evaluations where we also tried to conceptualize a set of minimal viable safety evaluations.
Just as aspirations for AI range from mundane efficiencies to superintelligence, AI safety considerations also range from micro-fixes to the catastrophic. AI is a loose term that includes a range of applications- from foundational model services handed out at schools and workplaces, to image recognition and natural language processing for quiz assessments, to personalized advice for primary care givers. The Agency Fund evaluation playbook has a challenging mandate of identifying best practices for design and evaluations common to a wide set of applications. It is similarly challenging to speak of safety evaluations in abstraction. Yet, that is the task that the breakout room had at hand.
What struck me through the numerous threads of the conversation was that the narratives around AI, writ large, were also affecting our perceptions of what AI safety work means. I am very much in the ‘AI as normal technology’ camp. This means that building AI safely today, is more of fixing the nuts and bolts in the here and now. That isn’t to say that we shouldn’t have a sight of the emergent long-term risks, but conflating the two time scales of risks can be counterproductive. The long-term risks feel ambiguous and hard to measure and many social sector teams can’t devote resources to that task. In the process, it can feel challenging to do any safety work. One of the issues we touched on in the discussion was the use of AI for the generation of synthetic media. It is a vexing issue, and one that Tattle devotes a fair bit of time on in the context of misinformation and gendered abuse. But it isn’t something that any of the non-profits we’ve engaged with have encountered in the context of their services. We have, however, seen instances of AI bots disclosing caste information, or being used for questions on sex determination. Those are the risks that impact the utility and perception of a service provided by the social sector. Fixing these, even if with imperfect band-aids, is possible.
Something I have heard now, more than once, is that too much emphasis on safety or responsibility is taking away from the actual work of building applications or innovation. I also attribute this to narratives and noise around AI that is making it difficult to have useful or actionable conversations. On one hand you have the safety vs. innovation debate within the ‘AI as a superintelligence’ community. On the other, you have years of policy chatter on responsible AI that hasn’t felt actionable to product teams. The gap between principles and action is slowly being bridged but there is perhaps some exhaustion in builders with conversations around responsibility and safety.
The corollary to AI safety that I find more useful is cybersecurity. People don’t argue that investing in cybersecurity is a threat to building the product. Good engineering practices will take care of baseline security considerations. And then there is some more that a team can do in terms of cybersecurity audits. That additional effort will vary for each product depending on the sensitivity of their application. Most teams work with some kind of a Pareto principle- they do things that take care of 80% of the cybersecurity risks.
I think we will also get to a similar pareto principle with safety work in AI applications. After all, hallucinations are a user experience and a safety issue. Good design will also result in safer applications. We’ve seen a couple of design practices in the more mature AI chatbots, motivated not by safety, but that also help it:
Restriction on the length of the conversation:
An LLM as a judge that reviews the output of the primary foundational model, against the user input:
Observability stack!
The specific design details vary across organizations but all this to say that we can get a decent distance in building safe applications doing a few things right. There are immediate safety concerns with AI applications that can and should be fixed. In fact, there are existing resources for data governance and consent in international development, which if adhered to, will help in safer AI applications. Organizations, social impact or not, should be investing in good design and safety practices.
As for the longer term risks, I think these will be better handled by people who are a couple of steps removed from product development in the social sector. These are groups who have one eye to the latest in R&D of foundational models, and one eye to the diversity of global contexts in which AI is being deployed. Some of the long term risks will eventually become immediate risks. This group can be the bridge relaying feedback to foundational model companies, and ideating on possible solutions for the social sector.