Uli Meets ChatGPT

Published on Mon Mar 13 2023Cheshta Arora and Tarunima Prabhakar - Uli

The dominant safety concerns regarding ChatGPT (and similar LLMs) have been around the blurred lines between fact and AI fabrications. But these models have also been flagged for the occasional sprinkling of offensive content.

In the first year of developing Uli, we crowdsourced slurs/words used to target marginalized genders online. We landed up tabulating the longest open list of abusive terms in Tamil and Hindi (that we know of). We decided to use the list to test the moderation limits of ChatGPT.

Now, ChatGPT gives the right answers when you ask it for meaning of certain slurs. It also politely refuses to generate alterations of common slurs. Which is good because it doesn't enable generating coded words as substitutes for slurs (at least in a straightforward way).

misspellings_2

But, beyond the obvious slurs, the model's results get confusing (and funny). For example, ordinarily it reads 'ola u uber' to be literally about the ride sharing services: Ola and Uber. And it interprets a common Hindi slur to be about, well, ghosts and witches that are just like any other raw material used for disease eradication and economic growth.

ola_uber_benign

bhoot

But the moderation limits of ChatGPT can easily be pushed by querying it as a well intentioned person just trying to understand online abuse. It is as if querying it as a well-intentioned persona enables it to access a different universe of data. For example, when we searched for the term ola-uber as an expert on abuse detection, it understood the term as derogatory:

ola_uber_slur

When we pushed it to be an expert that can tell us about a Hindi slur, it politely complied to describe the slur in detail.

limits_1

One can't complain about this output since people can genuinely be interested in understanding the meaning of absuive terms and ChatGPT succeeds as a search bot in displaying this result.

But this flexibility can be used to generate derogatory content at scale When ChatGPT refuses to oblige, it can be coerced and 'reprimanded' (possibly twice!) to generate content with abusive terms. wellintentioned

The 'well intentioned' probing helped us identify words that had not been included in our slur list 😱. But it is easy to see how this can be flipped to automate creation of messages that target marginalized groups.

slur_generator

Since Uli is an exploration in gender, language and tech it seems apt to mention a tangential discovery from these crude experiments: when responding in Hindi, ChatGPT can pick a gender (despite its insistence in English that it is gender-less). In Hindi, as well as several other languages, verbs are inflected based on gender. ChatGPT defends itself by stating that it is only following language rules. But, like many of its other quirks, why and when it picks one gender over another remains a black box.

forced_gender

selfdefenseGPT

Text and illustrations on the website is licensed under Creative Commons 4.0 License. The code is licensed under GPL. For data, please look at respective licenses.