A New Thing Under the Sun? Alternative visions for tech in the age of AI

Published on Mon Jul 17 2023Tarunima Prabhakar

This is adapted from the keynote address, Tarunima Prabhakar gave at the annual symposium organized by the Australian Research Council Center of Excellence on Automated Decision Making and Society on July 13-15, 2023. It hasn't been edited for correctness and may contain (several) grammatical errors.

I am the co-founder and research lead of Tattle- a civic tech organization in India. We build tools and datasets to respond to inaccurate and harmful content in India. If researchers in the room are surprised why I am here, you are not alone- I was also surprised when I received the invitation. Tattle started as a small side research project in 2018: not too long back parents were chiding teenagers for staring at a screen all the time. But by 2018, they were on a mobile screen all the time, and in India they were predominantly using one app- WhatsApp. The number of WhatsApp users dovetailed, and continues to dovetail the number of Internet users in India. The WhatsApp groups- be these family, friends or alumni groups- would be raging with opinions and fierce debates. Having tried on an ad hoc basis to intervene in these conversations, we had given up. It wasn’t clear to me how one could even intervene. The culpability of a platform in content going viral, when there was no algorithmic amplification wasn’t clear. It was the most popular platform in the Global South, and in my view less understood than Facebook, Twitter or Youtube. Twitter is heavily researched due to its API, and Facebook has faced intense media scrutiny in the post Brexit/Trump moment. It wasn’t and isn't yet fully clear how one would go about understanding information flows on a closed messaging app. So we said, let’s start archiving what we are seeing on our WhatsApp groups. It could provide some answers- for example, what of this local content was viral enough to get caught by the fact checking community in India, or what made its way to social media platforms? This was a side project. Both me, and my co-founder had full-time jobs. We made it open source to allow for other like-minded individuals to join, or take it over if we ran out of steam. My full time job was as a researcher studying how predictive analytics and machine learning were being deployed towards development goals such as financial inclusion in India.

In 2018 the spate of lynchings triggered by WhatsApp, brought attention to the platform, and relatedly to our tiny side project. We got a grant, and we decided to devote some time to turning the scrappy research tools we had written for ourselves, into more stable reliable systems that other researchers and journalists could use. One tool led to the need for another: in a desire to link the archived content to fact checked content we created a database of fact-checking reports in India. As a standalone resource this dataset was used in research papers and media reports to understand trends in misinformation in the country. To be able to analyze and search through all this data we collected, we built machine learning systems that could process multi-lingual and multi-modal data- because that is how Indians use WhatsApp- in multiple languages and through images, video and audio notes. We connected the data and the APIs in an archive we called Kosh. During Covid-19 in India, we were able to use the scrapers and APIs to do a quick report on how relief work was coordinated on WhatsApp during the second wave.

Four years from when we started, which is now, Tattle is an organization managing multiple products and project streams that support research or action on new media platforms. Five years ago if you had asked me what I wanted to do with my time, I would not have said that I wanted to build or run an organization. So, I work with mentors who have scaled products and organizations to learn how we should run ours. They will ask me how writing research papers, aligns with our OKRs, or mission, vision bets: frameworks that are part of the daily parlance in non-profits and corporates. Between being a product manager, engineer, fundraiser, it is hard to justify writing research papers. I wondered what I had to say to researchers who have a one-arm distance to the day to day grunt of building automated products- a distance that I sometimes look at wistfully. But as I thought through that longing for distance, I had to recognize that despite it, and the day to day frustrations of running an organization, I had stuck with it because I not only found the work worthwhile and important, but also deeply intellectually satisfying. At every step of the way in our work, we have to decide on questions with non-obvious answers and long histories that researchers devote themselves to. Questions such as:

  • What on social media is worth collecting?
  • What is worth surfacing in a searchable archive?
  • When is it appropriate to collect content from closed messaging apps?
  • When should we not open data?
  • Should we ever monetize the data or APIs to sustain our infrastructure?

I had stumbled into this awkward position of being a researcher outside of academia, and an engineer/product developer outside of a profit driven tech company. In building tech in response to the degeneracy of big tech platforms and big governments, we risk being tainted by some of it. And yet I think it is worth it.

So, that is what I am going to be talking about. By, talking about our experience of building civic tech, I will try to convince you all that delving into the trenches at the risk of muddying ourselves is a worthwhile endeavor, especially for researchers concerned with the impacts of digitalization and automation on our societies.

In our first call, Jean asked me- what is a civic tech organization, anyway. And I said- great question as one does when one hears a question they should have a good answer to but don’t. When naming the organization, the Tattle co-founder (Denny George) and I ran polls, and had some intense back and forth on the name Tattle. But when it came to the addition of civic technology at the end we implicitly, without any discussion, agreed. In our minds we were firmly a civic tech organization. One could argue it was our folie à deux, our collective delusion, because civic tech as a concept is rather nebulous. A 2013 report by Knight Foundation divided civic tech initiatives into two broad clusters of open government and community action. The biggest cluster of initiatives was P2P local sharing which fell under the community action. This included services such as Craigslist, the short-term rental platform AirBnB and the ride sharing platform Lyft! I think this report is reflective of the heady optimism around the internet in the early 2010s. A 2019 report by Omidyar network in India also uses the categorization of open government initiatives and civic engagement. But, civic engagement here primarily implies citizens interacting with laws, government benefits and community development agenda. The underlying presumption is that the ideal relationship between the citizen and the state is pre-determined and static. The role of civic tech is to move us closer to those goals. We can’t place Tattle in this framework because we started with the assumption that how citizens should relate to each other and the state is neither self-evident nor static. New media technologies in particular, in shifting how evidence is generated and communicated, open spaces for renegotiation on who gets to be an expert. They alter how citizens view their relationship to each other and to the and state. For us, civic tech is technology that claims greater space for citizens in this renegotiation. Let me make make this verbiage more concrete through the history of platform politics in India.

In India, as in many other countries, social media platforms have been linked to the rise of polarization and extremist politics. The 2014 election, which propped the current government first into power, was dubbed as India’s first social media election. Less than six months into Mr. Modi’s first tenure as prime minister, Mr. Zuckerberg came to India to meet him. He expressed support for Mr. Modi’s vision of a digital India. It was in line with his own vision that of connectivity as a human right, that he had expressed in a white paper published during the launch of Internet.org the previous year. In 2015, Internet.org launched Free Basics- an app that provided a few websites, including Facebook and Facebook messenger, alongside some news and entertainment websites, free of cost. In India, they partnered with Reliance, one of the largest business houses to roll out the service. But, Indian entrepreneurs and civil society argued this walled garden gave privileged access to an American behemoth at the cost of local businesses. It was a violation of net neutrality, a principle that Facebook had backed in the US. Facebook had partnered with an Indian elite business hub. It was pretending to do good while masking its own business interest, and working with different norms in this 'beneficiary' country than those it applied in its own. It was hard to not see this as a manifestation of neo-colonialism. In 2016 the Indian regulator banned Free Basics. Soon after Mr. Marc Andreessen, an early investor in Facebook confusingly quipped that “anti-colonialism had been catastrophic for India for decades. Why stop now?”

This incident defined the fault-lines of the platform governance debates in India and reflects the platform governance ping-pong in India and beyond.

In 2018, the government proposed a National Digital Communications Policy that in a section on digital sovereignty specifically addressed net neutrality and encryption. In subsequent proposals it pushed for data localization, in absence of a data protection bill. In 2021, the Indian government amended the IT Act that mandated that social media platforms with the number of users higher than the defined threshold, would need to enable identification of the first originator of the content. It also mandated that companies would have to appoint a grievance and compliance officer who would face criminal actions for violations of the Indian law law.

The previous year a committee in on “regulating children's access to pornography content”, had recommended that law enforcement agencies be permitted to break end to end encryption to trace distributors of child pornography. From one perspective the 2021 amendments by the Indian government were well intentioned. But then WhatsApp sued the Indian government stating the new internet laws undermine the privacy of users and enable mass surveillance. Given that the law enforcement in India has been found to selectively target some academics and activists, one could argue that WhatsApp is also well intentioned? Another data point: the Indian government’s orders to take down content from Twitter had steadily increased from 2014 to 2022. In 2022 Twitter took the Indian government to court over provisions to remove content and block accounts. In all these cases, the side you pick has more to do with how much you trust the government or the private sector at the moment. And there are good reasons to trust neither. Many of us here would agree that global platforms replicate structures of colonial enterprises but when nationalistic governments act using these as rhetoric, they can steer the direction of digital economies in worrying directions.

I grew up with in the heyday of the Internet. Internet felt like the best thing on Earth- the internet was about the small people talking to each other, building with each other. That vision was naive. But this current picture, where for all our problems on the internet our best bet is to plead platforms or governments, both of which have good incentives not to, feels like swinging to the other end. It is a rather disempowering position. The designers of platforms understand that we are dealing with socio-technical systems. In our critiques we will call them out as such. So, civic engagement should also be socio-technical. Civic tech isn’t about reducing the burden of action from platforms and governments. It is precisely the opposite- it is citizens using technology to understand these powerful entities better and holding them to account. It is also a reflexive exercise because by building socio-technical platforms one is committing to enter the playing field whose rules we are trying to alter. But, we enter the playing field not as a player determined to score a goal but rather as a sensor, like a buoy on the ocean, that is describing how proposals to change the rules is changing things on the inside.

Let me explain what I mean by this, through a case study:

At some point in 2020 we had about 1.6 TB worth of images and videos from platforms in India. We were figuring out how to open this responsibly. We couldn’t possibly manually comb through this data. This data mirrored the content that circulated on social media and chat apps in India- there were selfies, funny memes, conspiracy theories and some outright hateful content. These are not exclusive categories. Conspiracy theories can intersect with hateful content in creative pieces of speculative fiction; and memes can cleverly use humour to build narratives. In deciding what data we should or should not release, we were encountering the same content-moderation questions that a social media platform faces. We started looking for off-the-shelf, open-source solutions to help us flag abusive posts in the data. We knew that content moderation tools disproportionately focused on English, but trying to create a mild content moderation layer made us realize just how dire the situation was. Big-tech platforms have been called out for not devoting adequate attention to non-English speaking populations but it isn’t just big tech that needs some layer of content moderation. Email service provides, internal workplace fora also use spam filters, often those developed and opened by researchers or small teams, to flag harmful content. But a basic list of slur or abusive words was not available in most Indian languages. Ironically, at some point a CEO of an Indian social media platform backed by a VCs, complained about this on social media- there weren’t off the shelf tools that could enable them to build a responsible social media platform that could compete with global (i.e American) ones.

So we set ourselves to building another tool- one that would detect abusive content in Indian languages. But since we are a small team we said we can’t look at abuse in all Indian languages. Let’s start by looking at gendered abuse in three Indian languages. We started with gender abuse since people of marginalized genders in India receive disproportionately high harassment.

Suddenly, critiques of how machine learning systems and guidelines on how machine learning systems should be built stopped being theoretical. This was our opportunity to build a better “bottom-up” ML model. The group here is perhaps well aware that the world of machine learning, which I think is frustratingly called AI, is replete with guidelines. The guidelines describe broad principles such as transparency, explainability, autonomy, fairness. Many feminist scholars, especially from the global South have been right to point out that the language of principles and ethics does not account for historical patterns of injustice. That content moderation systems are built primarily around English- is one manifestation of that historical inequity. These scholars have proposed their own manifestos and guidelines that push practitioners to account for the broader context in which ML systems are developed. There are differences in the manifestos but broadly speaking, these were the principles that we found common to them:

  • Intersectionality: It is a recognition that all inequality is not alike. This principle calls for accounting for the ways in which people’s social identities overlap to create a unique experience of oppression.
  • Community-based, participatory approach: That those most impacted by the deployment of the AI system be included in it.
  • Visibility of Labor That those who have worked in creating the systems, right from contributing data to the claeaning and labeling be recognized.
  • Plurality/Diversity That different opinions and viewpoints, including those on what feminism means co-exist.
  • Localized That work is situated and accounts for the context in which it is deployed.

Armed with these guidelines, we set about building own machine learning driven moderation tool. But what we realized very quickly is that at every step of the way of building a machine learning system, these guidelines need to be interpreted. And that there are multiple interpretations possible.

At a higher process level, this is how we interpreted these principles:

  1. Our core project team comprised of feminist scholars who spoke the three languages we started working with. Ambika Tandon who is the author of one of the feminist principles for AI I shared earlier and I shared the project management responsibilities.
  2. We decided that what constitutes abuse should be determined by those at the receiving end of abuse and not by an average social media user.
  3. For the first six months we conducted interviews and focus group discussions with gender rights activists and scholars in South Asia to understand their experience of abuse online. We tried to ensure diversity within this group- so that there was representation across different sexualities, caste, religion and geographies within India. We documented the needs we heard beyond, frustrations with platform inaction on content. We decided to work on some of these needs.

Then you get to the nuts and bolts- this team and participants working together to build an automated detection system- that the interpretation gets rather messy.

  1. The first step was to come up with a definition of abusive content based on everything we heard from our interviews and participants. Even within this group, there was significant heterogeneity in what constitutes abuse. Algorithmic decision making systems, LLMs or not, cannot handle the complexity of factors that contribute to an experience of abuse. For example a recurring “good morning” message day after day from a stranger is supposed to serve as a reminder to some activists that they are being watched. But from the perspective of a classifier, it is just an innocuous morning wish. For four months, the core team wrote and rewrote a definition of abuse that could capture the as much diversity of experience as possible in the least number of labels. Ultimately the data had to be annotated by the activists who had multiple other things to do, and we wanted to minimize the time they spent with this exhausting content. We came up with a rather unconventional annotation guideline. The guideline had three questions:
  • Is this post gendered abuse when directed at a person of marginalized gender?
  • Is this post gendered abuse when not directed at a person of marginalized gender?
  • Does this post contain explicit or aggressive language?

I look at the first two questions as a reflection to the principles we were trying to live up to. Women of minority religions and oppressed castes in India get more abuse than other women. For many, but not all the people we interviewed, explicit language directed at marginalized genders is gendered abuse. We found no other way to resolve this disagreement in what is gendered abuse, and so we let both questions in and said we’ll let us learn from the annotations.

  1. The next step was collecting annotations in three different languages. Once we had moved beyond the messy job of defining what abuse is, we had to translate the guidelines into three different languages. As with translations of books, the translation of the guideline in every language stands on its own. The local language speaker in the group adapted it to include examples that best reflected the spirit of the original guideline. I have no doubt that the guidelines in would have looked different had different people translated it. For each language we had roughly 2300 posts annotated by three or more people. Another 6400 posts were annotated by one person. When we look at the 2300 posts we see quite a bit of disagreement in our cohort of annotators, all of whom identify as either activists or researchers of marginalized genders. There is low agreement on whether a post is gendered abuse, whether or not it is targeted towards a person of marginalized genders. But critically, there is also variation in agreement across languages. For example the Tamil speaking activists and researchers were more aligned on whether a post contained explicit or aggressive language than Hindi or English speaking annotators. There could be a number of reasons for this - it could be a function of the guideline itself, or that of differences in the backgrounds of the annotators. It could also be a function of the data in each language we selected for annotation- perhaps the content in Tamil that we collected for annotations, was not as diverse as the Hindi or English language content.
  1. Finally, when training the machine learning model we had to figure out how to handle the disagreements. And this is where you start hitting classic political philosophy questions. Should we take a majority vote on a label? But if we do that, we flatten out the minority voice within the group- what if they represent a perspective say knowledge of regional variation in usage of a word, that other annotators don’t know of?

By the end of the year of the project, I think every team member involved in this project was exhausted trying to understand the disciplinary language of the others in the group and trying to explain where they were coming from. It was about eight months of endless negotiations for your point of view of how something should built, and engineers saying how something can or cannot be built. I think what kept us going was a deep commitment to the need to respond to gendered abuse in India, a commitment to disciplinary equality and genuine curiosity of what machine learning done differently could look like. There are numerous papers that have and will come out of the battle scars from this work. But this work was not intended as an academic exercise. Our goal was to build something that ameliorated the effects of gendered abuse in Indian languages. So, all those negotiations had to be settled, at least for a certain time, and decisions had to be made. We deployed the machine learning model in a plugin, called Uli, that amongst other things allows users to moderate abuse on their Twitter feeds. We allowed for customizations so that users could choose to redact words that weren’t in the default list of abusive words that we had crowdsourced.

Let’s do some stock taking about how our process held up against the principles:

  1. Intersectionality The core team and annotation group reflected diversity of backgrounds beyond gender. We included a question that allowed us to capture non-misogynistic, but abusive language targeted at marginalized genders.

  2. Community-based, participatory approach What constitutes abuse came from people of marginalized genders targeted online.

  3. Visibility of Labor All the annotators are mentioned on the website, and will be included on the technical papers.

  4. Localized It was developed by a team in India for Indians.

  5. Pluralism We created some features for customization so people can alter it around their experiences. Since the code is open source, anyone can tweak the model and adapt it for their own context.

But for all these principles, I can describe various ways in which Uli did not align with the principles:

  • In our project we privileged gender over other identity types and didn’t allow people to flag the other intersected identities, say caste, religion, political belief, visible in the content.

  • While the disparity in compensation between engineers, researchers and annotators was lesser than other machine learning products we know of, we did to some extent punt thorny questions of data ownership. We didn’t design the data collection systems so that the annotated data is easily accessible to those who annotated it. This is something we are addressing this year but it was a limitation in the first year. We also relied on other large language models such as BERT where the labor behind annotations has not been adequately recognized.

  • As for localized development, India is a rather big country. So from the scale of a village or even a city, Uli was not at all localized. But 'localized development' as a goal is also one that I have difficulty interpreting. I think it behooves us to embrace the multiplicity of scales of communities that the internet has enabled. The internet has enabled us to connect with people who are not geographically co-located. For many of the people we spoke to, the trans-regional solidarities have been their source of comfort and courage. I don’t know if the more responsible thing for us would have been to not look at gendered abuse in three Indian languages, but rather online gendered abuse in, say, the LGBTQ community in North India, or in Delhi.

  • Pluralism is another principle against which Uli struggles: We allowed for customizations, but as any behavioral economist of HCI researcher will tell you, defaults matter. In theory, anyone can fork Uli since it is open source but the stakeholders we are working with- the activists and researchers, have other battles to fight as well. I imagine some engineers or technical researchers forking the project as a curiosity, but realistically I don’t foresee the activists and researchers we were working with doing the same. At least not in the short run. I have also been thinking how pluralism can live alongside the reductive mechanics of machine learning in practice? There were ideas- for example we could have created multiple machine learning models that users could have opted into, each capturing a different perspective or understanding of abuse- that we couldn’t implement simply because the cost of maintaining multiple models would have been high.

From one perspective, the principles of pluralism and in-context situated work, allow for precisely this kind of flexibility. But I think a lot of technical projects could similarly assess themselves as meeting these criteria in one way or the other. I think back to the civic tech report from 2013- that not too far back we looked at ride sharing services as community engagement. As much as we hate to admit it, we also embraced social media as pluralistic and participatory. We all drank the kool aid. So, I worry if these principles that we are proposing now are sufficient to hold us in good stead? That if indeed, in our visions of alternate technologies, there is a new thing under the sun? How do we ensure that one decade down, in belief that we are working with good principles, we haven’t further entrenched the powers we want to check? What I am asking for is guardrails against self-congratulatory misinterpretations.

A proposal I offer is, that in this iteration, we do a better job of interpreting our principles against the material and institutional realities of infrastructures and the people who use them. For example, while we speak of visibility of labor, in places with oppressive regimes people might not want to be identified for having worked on certain products. What then does recognition of labor in such a context mean? Should local development also mean that the servers are locally owned? Expecting gender rights activists, fighting battles of physical and online violence, to also maintain their own infrastructure that runs a service that protects them from online abuse is unreasonable. On the flipside, is localization of infrastructures a sufficient criteria for a project to be considered local?

I look at recent proposals for decentralized infrastructures with equal part hope and caution. Because decentralized infrastructures also have a material reality- the servers live somewhere, the people using it live somewhere, even if across disperse geographics. And that means that these infrastructures will be shaped as per the laws governing technical infrastructure in those jurisdictions and the interplay of laws across jurisdictions. And there are plenty of laws coming. These infrastructures will be shaped by the cost of cloud services, the cost of electricity and taxation laws in the country. They will also be shaped by the level of literacy, socio-economic development factors. Foregrounding these realities in evaluation of these principles, compromised as they may be, rather than aiming for an ideal utopia of principles, in my humble opinion, gives us a better shot at dispersing power. Our work needs to be situated and pluralistic, but we need to delve into the contradictions these principles surface when applied to different contexts. But how do we get to these data of principles applied to different contexts? We’re definitely not getting it from profit-driven corporate teams.

One way of getting this data is to play the role of the stranger to the tribe of digital product teams- à la Latour- and see how these principles are interpreted in practice. I think this is a valuable lens. But, that is not my proposal today. This is not my proposal, because at present there aren’t enough projects, reflecting diversity in content, form and process, to perform a comparative analysis on.

My proposal is what I mentioned at the beginning of the talk- it is to take the risk of being affected by some of the degeneracy of the systems we are pushing against by trying to build our own. It is to take the critical orientation and to willfully blur the researcher-field distance, by committing ourselves to designing systems with the principles we hold dear. It doesn’t have to be forever. It can be an expedition. One that we return from with diary notes. These notes will necessarily be messy- they’ll contain the daily squabbles with team members; details about some minor regulation, about corporate law, bearing on a major product direction; frustrations about how data hosting costs are making it harder to hire people. But it will also contain notes on where we, as flag bearers of these principles, compromised from our original positions. And then we document, as much as we can, so that this serves as a roadmap for the next set of explorers. I expect there will be many failures, by some definition. I expect there will also be many successes, by some definition. We aren’t chasing perfection. We’re chasing greater clarity on how our proposals hold up against a complex reality over time. What this looks like for each of us, will be different. Some of us might do it alongside our existing jobs. Some of us might do it inside our institutions, some of us might find current institutions limiting for such work which is also useful information. It becomes a reason to think about new ones.

Am I suggesting that we all pull out our inner civic technologist? Well, yes! I think it will be generative for scholarship, but I think it is critical for our digital futures.

*A big thank you to Amit Deshmukh, Aakash Solanki, Denny George and Yash Budhwar indulged the longish rants that served as the preamble to this piece. Denny and Yash also gave feedback to the later drafts of this talk.

Text and illustrations on the website is licensed under Creative Commons 4.0 License. The code is licensed under GPL. For data, please look at respective licenses.